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Chapter!: Preliminary iviamemaiicai 


Chapter 1: Preliminary Mathematical Concepts 

Mathematics provides the powerful tools for establishing any theory and idea. The 
mathematical modeling helps to define the neural networks and their functionalities. The 
basic operations of the neurons and its topological structures can easily specify with the 
help of mathematical modeling. The input/ output patterns for the neural network can 
represent with vector and the various learning methods of neural networks are define to 
employ the mathematical tools like differential equation, gradient methods, matrix 
operations and many more. In this chapter we are discussing [1-7] the basic preliminaries 
of mathematical tools, those are used very frequently to understand the neural network 
architecture and its learning techniques accomplish the task of pattern recognition. 

LI Sets 

A set is a collection of objects or elements, sharing the common characteristics or 
properties. A set is denoted by listing the elements between braces. The set of positive 
integers is {l,2,3, . . .} . We also denote a sets with the notation {x | conditions on x} for sets 
that are more easily described than enumerated. This is read as “the set of elements x 
such that X satisfies. . .” x e 5" is the notation for “x is an element of the set S”. To express 
the opposite we have x^S for “x is not an element of the set S”. 

The following is the summary to define the sets adequately: 

• The empty set is defined as a set with nothing. It may be denoted as { } or . 

• Two sets A and B are equal if they contain exactly the same elements. 

• Two sets A and B are said be disjoint, if they have no elements in common. 


1 




Chapter I: Preliminary Mathematical Concepts 


• A is subset of B if every element in A also in B, it is denoted as A c B . 

• The cardinal number or cardinality of a finite set is non negative integer 
representing the number of elements in the set. This is denoted as|A| . 

• The complement of a set A is defined as a set of all elements (of a special set S, 
the space or universe) that are not in A and it is denoted as A . 

• Intersection of set A and B is defined as: 

AnB = |x:j:€A and xeB 

• Union of two sets A and B is defined as 
AUB = |xj:eA or x€B 

• Differnce of two sets A and B is defined as 
A- B = {x|x 6 A and xi b] 

• A partition P of a set A is a collection of mutually exclusive subsets of A, that 

satisfy: A,nAy unless i = j and |Ja,=A 

i 

1.2 Relations 

Relations are based on the notion of set mapping and provide a mathematical formalism 
for the representation of structure. 
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If A and B are sets, a relation from A to B is a subset of AxB . Where Ax b denotes the 
Cartesian product of the set A and B. Given a set 

A = {a,b,c, } and B = {x,y,z, } , 

a relation from A to B, namely R, satisfies c AxB . 

This is defined as a binary relation, since it involves only two sets and provides a way of 
connecting or relating members of the sets. The relations can also be defined as sets of 
ordered pair say (jc,:);), here set of all possible values of x is the domain and the set of all 
possible values of y is called the range. The relations have a direction or ordering, this 
directionality and ordering of a relation defines the following important properties: 

• Reflexive: R is reflexive if, Va e A, [a, a) e R. 

• Symmetric: R is symmetric if, v(a, b) e. R, {b, a) e R. 

• 7>a««ftV£; i? is transitive if, and {b,c)eR then (a,c)eR 

• Equivalence Relations: A relation that satisfies all three above mentioned properties, 
is termed as an equivalence relation. 

L3 Functions 

A function/ from A to B is a relation such that for every asA,3 one and only one 
b eB, such that (a,i))e /. Usually it has been shown as: 

/;A=^B, 
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where A is the domain of function / and B is the range of the function. If {a, b)e f, we say 
that b is the function value of a, and denoted as: 
b = f{a) 

If b = f{a), then we can write a = /“'(i) where /"* is the inverse of /.If b = f{a) is a 
one-to-one function, then f~^ib) is also one-to-one function. In this case, 
a = /■'(/(a))= /(/"'(x)) for each value of a where both f{a) and /"'(a) are defined. If 
b = f{a) is a many-to-one function, then a = is a one-to-many function. is a 

multi-valued function. 

1.4 Scalars and Vectors 

A vector is a quantity having both magnitude and a direction. Examples of vector 
quantities are velocity, force and position. One can represent a vector in n-dimensional 
space with an arrow whose initial point is at the origin (figure 1.1). The magnitude is the 
length of the vector. 



X 

Figure 1.1: Graphical representation of a vector in three dimensions. 
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A scalar has only a magnitude. Examples of scalar quantities are mass, time and speed. 

1.4.1 Vector Algebra 

Two vectors are equal if they have the same magnitude and direction. The negative of a 
vector, denoted - a , is a vector of the same magnitude as a but in the opposite direction. 
We add two vectors a and b by placing the tail of b at the head of a and defining a + b 
to be the vector with tail at the origin and head at the head of b as shown in figure 1.2. 




Figure 1.2: Vector arithmetic 

The difference, a-b, is defined as the sum of a and the negative of b, a+{-b). The 
result of multiplying a by scalar a is a vector of magnitude |a||a| with the 
same/opposite direction if a is positive/negative. 

Zero and Unit Vectors: The additive identity element for vectors is the zero vector or 
null vector. This is vector of magnitude zero which is denoted as 0. A unit vector is a 
vector of magnitude one. If a is nonzero then n/ja] is a unit vector in the direction of a . 
Unit vectors are often denoted with a caret over-line, h. 
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Rectangular Unit Vectors: In n dimensional Cartesian space, the unit vectors in the 

directions of the coordinates axes are e^,e^, e„. These are called the rectangular unit 

vectors. 

Linear Dependence Vectors: A set of M-dimensional vectors {x^,x 2 ,x^ ^w}is said to 

be linearly dependent if there exist numbers {c, ,C 2 , cj c ,^ } not all 0 such that 

CiXi +C2X2 +C2X2 + 

Inner Product: The inner product of two vectors 

x,yeR^ ,oc = [xx,X 2 x^^Y and j = [vi , J 2 Pu is defined as 


M 


( 1 . 1 ) 


(=1 

When the product of two vectors {x,y) is 0 , then vectors are said to be orthogonal. 


1.4.2 Kronecker Delta and Einstein Summation Convention 
The Kjronecker Delta tensor is defined as. 



jl if i = j 

[0 if i* j 


( 1 . 2 ) 


This notation is useful in out work with vectors. 

Consider writing a vector in terms of its rectangular components. Instead of using 
ellipses: a = a,e, + . . . + , we could write the expression as a sum: a = a^e,. , where 

it is imderstood that whenever as index is repeated in a term we sum over that index fiom 
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\ to n. This is the Einstein summation convention. A repeated index is called a 
summation index or a dummy index. Other indices can take any value from 1 to n and 
are called free indices. 

1.5 Matrix 

A matrix is a rectangular array of values arranged into rows and columns. Here is a 
matrix A of size m x n: 

«12 «i3 ^ 

‘^21 <^22 ^23 '^ 2n 

^ml ^m2 

V J 

The summation or subtraction of two matrices A and B is defined by adding or 
subtracting corresponding elements as: 



Al ^^12 ^13 



ft,, 

^12 ^13 

hn 




+M/7 ^ 


^21 ^22 

^2n 


hx 

^22 ^23* — 

— ••^2w 




A± B = 

"m2 

V 


± 

^m\ 

\ 

^m2-** 






The summation or subtraction is undefined if the size of the matrices is different. 

The multiplication of two matrices is possible only when the number of columns of the 
first matrix is equal to the number of rows of the second matrix i.e. the product for two 
matrices A and B could be possible only when the order of A is i x j and the order of B 
must j X k, the product matrix will be of the order of the I X k. 
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jk 


(1.3) 


The identity matrix I has elements ly such as: ly = 1 when i = j and ly = 0 otherwise. 
The transpose of a matrix A = ]ay J ^ is defined as 


=[aj\ 


(1.4) 


The inverse of a square matrix A = is defined as 


A 


-1 Adj(A) 
det(A) 


(1.5) 


The rank of a matrix A e is defined as the number of linearly independent rows and 
columns of A. If A is a full rank, then its rank is N or M, which ever is lower. 

The trace of a matrix is defined as a sum of the diagonal elements of the matrix 

trace{A) = (1.6) 

;=1 

1.5,1 Pseudo inverse 

The pseudo inverse A ^ , also called the Moore-Penrose generalized inverse, of a matrix 
AeRn,j<„ is unique, which satisfies 

AA^A = A 
aUa^ =A^ 

(AA^f =AA^ 

(A^Af =A^A 
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can be calculated by 

A^=(aTA)-'aT (1.7) 

If A^A is nonsingular, and 

A+=A^(AA^)-’ (1.8) 

if AA^ is nonsingular. The pseudo inverse is directly associated with linear LS problems. 

When A is a square nonsingular matrix, the pseudo inverse A- is its inverse A"' . For a 
scalar or, ifa^^O, a'^ if a = 0,a’’^ =0 

1.5.2 Vector norms and Matrix Norms 

A norm acts as a measure of distance. A vector norm on if" is a mapping f:R”-^R that 
satisfies such properties: For any x,yeR'‘,aeR, 

• /(x)>0, and f{x)=0,iff x = Q 

• f{x + y)<fix)+fiy). 

The mapping is denoted as /(;c)= ||x|| . 

The p-norm or Z^-norm is a popular class of vector norms: 


1 



with /3 > 1 . Usually, the I, , Zj and Z„ norms are more useful 
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Hi =Ekl 


/=! 


1 





m = max X; 


( 1 . 10 ) 


(1.11) 

( 1 . 12 ) 


The 1,2 -norm is the popular Euclidean norm. 

A matrix norm is generalization of the vector norm by extending R’' to . 

For a matrix A = [oy , the most frequently used matrix norms are the Frobenius norm 




1=1 7=1 J 


And the matrix j!?-norm 


||Ax|1 

||a| = sup ||-|. f ' = max||7\x|| 


(1.13) 


(1.14) 


where sup is the supreme operation. 

The matrix 2-norm and the Frobenius norm are invariant with respect to orthogonal 
transforms, that is, for all orthogonal gi and Q 2 of appropriate dimensions 


llaAfet -w,. 

(1.15) 

1S,A2j||, =|a||, 

(1.16) 
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1.6 Decomposition 

1.6.1 Eigen-value Decomposition 

Given a square matrix A g R"’'" , if there exists a scalar X and a nonzero vector v such that 

Av = Av (1.17) 

then X and v are, respectively, called an eigen-value of A and its corresponding 

eigenvector. All the eigen-values Xi, i= 1, • * ' , n, can be obtained by solving the 


characteristic equation ; 

det(A-A/)=0 


(1.18) 


where I is an n xn identity matrix. The set of all the eigen-values is called the spectrum of 
A. 

If A is nonsingular, 0 . If A is symmetric, then all the X/s are real. 

The maximum and minimum eigen- values satisfy the Rayleigh quotient 


4iax(A)=max 


v^Av 


V V 


(1.19) 


The trace of a matrix is equal to the sum of all its eigen- values and the determinant of a 
matrix is equal to the product of its eigen- values: 


tr 


(a)=;£a, 


/=! 


N=n^i 


( 1 . 20 ) 


( 1 . 21 ) 


;=1 


1.6.2 Singular Value Decomposition 
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Foramatrix AeR"”'" , there exist real unitary matrices C/ = [w,,w 2=“3 

K = [vi , V 2 , V 3 ]e such that 

U^AV = Z ( 1 . 22 ) 

where 2 e is a real pseudodiagonal tw x « matrix with CT; , / = 1,2,3, 4 p, 

/? = min(/M,n),(Ti > 0-2 > 0-3 > o-^ > 0 , on the diagonal and zeros off the diagonal. <j; ’s are 

called the singular values of A, «, and V; are, respectively called the left singular vector 
and right singular vector for cr, .They satisfy the relations: 

AV(=cr,W; and A^w,- =cr,v,-. 

Accordingly A can be written as 

A = m,V^ ( 1 . 23 ) 

where r is the cardinality of the smallest nonzero singular value. In the special case when 

( - -) 

A is a symmetric non-negative definite matrix, S = diag ^ , 2.j , 

V J 

where Ai > h >: •-2p >0 are the real eigenvalues of A,vi being the corresponding 
eigenvectors. 

The SVD is useful in many situations. The rank of A can be determined by the number of 
nonzero singular values. The power of A can be easily calculated by 
A*=t/i;*F^, where it is a positive integer. 

The SVD is extensively applied in linear inverse problems. The pseudoinverse of A can 
then be described by equation (1.7). 
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The Frobenius norm can thus be calculated as 


lAf,- 




VM J 

and the matrix 2-norm is calculated by 


(1.24) 


||A||,=^, (1.25) 

1.6.3 QR Decomposition 

Although the quantity (a^a) ’ exists, significant numerical difficulties may occur in 

computing this inverse in instance where A^A is nearly singular. For the full-rank or over 
determined linear LS case, m > n, can also be solved by using QR decomposition 
procedure. 

A is first factorized as 


A = QR 


(1.26) 


where Q is an m^m orthogonal matrix, that is, Q Q - I, and R = 


is an X n upper 


triangular matrix with ReR’”"'". 

Inserting (1.26) into the set of linear equation (SLE) Ax = b and premultiplying by 
we have 

Rx = Q^b (1.27) 


Denoting Q^b- 


b 

b 


where beR" 


and b e , we have 
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Rx = b (1-28) 

Since i? is a triangular matrix, x can be easily solved using backward substitution. This is 
the procedure used in the GSO procedure. 

When rank (A) < n, the rank-deficient LS problem has an infinite number of solutions, 
the QR decomposition does not necessarily produce an othonormal basis for range(A) = 
|y G ; y = Ax for some x g i?" }. The QR decomposition can be applied to produce an 
orthonormal basis for range (A). 

The QR decomposition is a basic method for computing the SVD. The QR decomposition 
itself can be computed by means of the Givens rotation, the Householder transform, or 
the GSO. 

1.6.4 Condition Numbers 

The condition number of a matrix A e is defined by 

cond {a) =\k I ||a^|| (1-29) 

‘ II lip II lip 

where p can be selected as 1, 2, oo, Frobenius, or any other norm. The relation, cond(A) > 
1, always holds. Matrices with small condition numbers are well conditioned, while 
matrices with large condition number are poorly conditioned or ill-conditioned. The 
condition number is especially useful in numerical computation, where ill-conditioned 
matrices are sensitive to rounding errors. For the L 2 -norm, 

cond2{A)=-—^ (1.30) 

CTj 

where j!7 = min(/w,«) . 
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1.7 Optimization Methods 
1. 7.1 Vector Gradient 

Let g be a differentiable scalar function of m variables. 


S = g{^i^ = 

where w = (wj, Then the vector gradient of g w.r.t. w is 

vector of partial derivatives of g , 


dw 


= Vg = Vwg = 


^ dg'^ 

A. 

. , 

V “y 


Similarly, we can define second-order gradient or Hessian matrix, 



f 



dwf 

awiw„ 

dw^ 

d^g 




dwl 


Generalization to the vector valued functions g(w) = (gi(w), , 

definition of the Jacobian matrix of g w.r.t. w , 



( 

5g„ ^ 

5g ^ 

awj 

aw, 

dw 




Am 



(1.31) 

the m-dimensional 

(1.32) 

(1.33) 

• n^)y ^ 

(1.34) 
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In this vector convention, the columns of the Jacobian matrix are gradients of the 
corresponding components functions -w.r.t. the vector w . 


The differentiation rules are analogous to the case of ordinary functions: 

a/(w)g(w) ^ 

aw dw ^ ^ ^ ’ dw 




\ dg{yv) ' 

dw 


dw 


«Hw) 


dw dw 


(1.35) 


( 1 . 36 ) 


(1.37) 


1.7.2 Matrix Gradient 

Consider a scalar valued function g of the nxm matrix W = [w ^ ) (e.g. determinant of a 

matrix). The matrix gradient w.r.t W is a matrix of the same dimension as W consisting 
of partial derivatives of g w.r.t. components of W : 



f dg 

dg ') 

% _ 

aw„ 


aw 

Sg 

dg 



^mn) 


(1.38) 


1.7.3 Taylor Expansion of Multivariate Functions 

The well known formula for Taylor series expansion of a scalar function g(w) reads, 


g(w') = g(w)+-^(w'-w)+l/2^(w’-wf +... = g(w)+^g'(wXw'-w)' (1.39) 
aw aw 
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This can be generalized to the function of m variables 


g(w') = g'{w)+ — (w'-w)+ l/2(w'-w)^ -^-^(w'-w)+ . 
dvi dw 


(1.40) 


with derivatives evaluated at w . Note that the second term is an inner product of a 
gradient of g with the vector w’-w and the third term is a quadratic form with the 


d^g 

Hessian matrix — ^ 
dw 


2 • 


Similarly for a scalar function of a matrix variable: 


g(w')=ir(w)+ trace] 


^ dg 


(W'-W) 


+. 


(1.41) 


n 

Reminder: trace{A) = ^ 


(1-42) 


(=1 


i.e. trace is defined as a sum of the diagonal elements of the matrix. The above formula 
shows 2 terms of Taylor expansion. It uses the extension of definition of an inner product 
to matrices, 


/-I y-l 


(1.43) 


but 


/ \ . jn m 

trace(A^B)= 


(1.44) 


1.7.4 Unconstramed Optimization 
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Gradient descent is a method of minimization of a given cost or objective function 

J(w): 

• Start at some initial point w(o) ; 

• Calculate gradient of J(w) at w(o); 

• Move in the direction of the negative gradient or steepest descent by some distance; 

• Repeat above until consecutive points are sufficiently close. 

In mathematical notation the above procedure reads 

Aw(f ) = -oc{t) (1-45) 

dw 

or 

(1-46) 

dw 

• Gradient descent moves always downwards in a hilly landscape 

• Local minima can trap the movement 

• Initialization is important to avoid local minima 

• Choice of the learning rate is crucial for speed of convergence; 

Stochastic gradient descent 

• Data dependent cost functions; 
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• Statistical model for data; 

• Typical form of cost function j(w) = £’[g(w,x)], where E denotes 

(i.e. E[x]= ); 

• X is the random vector modeling observation vector; 

• (Unknown) pdf of x is /(x) w.r.t. which the expectation is taken; 

• Usually only a sample x(l),x(2),... is given 
The steepest descent-learning rule becomes, 

w(/) = w(? - 1) - ccit)— x{t))] 

or 

w(r) = w{r - 1)- a | 


For twice differentiable g(w,x) w.r.t. w this equals, 


w{() = w((-l)-a(0j ^g(w,f) 




• Approximate the expectation by the sample mean 

• Batch-learning: using all available data in each step 
On line version is possible - drop the expectation. 


expectation 


(1.47) 


(1.48) 


(1.49) 


(1.50) 
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• Calculates direction of next move using only 1 data point (incoming) 

• The direction of movement fluctuates highly between the steps but 

• The average direction is approximately the steepest descent of the batch version 

• Lower computational cost 

• Slower - needs more iterations to converge 
1.8 Dynamical Systems 

For a dynamical system described by a set of ordinary differential equations, the stability 
of the system can be examined by Lyapunov’s second theorem or the Lipschitz condition. 
Lyapunov’s second theorem: For a dynamic system described by a set of differential 
equations 

f"fW (1-51) 

where X = X 2 it), * • , Xn{tyf and i = (fi, fz ’ ' , ^n)^. There exists a positive 

definite function E - E(x), called a Lyapunov function or energy function, so that 

(1.52) 

dt 4idxi dt 

7=1 J 

with — = 0only for — = 0, then the system is stable, and the trajectories jc will 
dtdt 

asymptotically converge to stationary points as t— > 00 . 

The stationary points are also knovra as equilibrium points and attractors. The crucial step 
in applying the Lyapunov’s second theorem is to find a suitable energy function. 
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Lipschitz Condition: For a dynamic system described by equation( 1 . 51 ) , a sufficient 
condition that guarantees the existence and uniqueness of the solution is given by the 
Lipschitz condition 

|l/(x,)-/(x2|<y|lx, -X 2 II (1.53) 

where y is any positive constant, called the Lipschitz constant, and xj, X2 are any two 
variables in the domain of the function vector f . f (x) is said to be Lipschitz continuous. 

If xi and X2 are in some neighborhood of x, then they are said to satisfy the Lipschitz 
condition locally and will reach a unique solution in the neighborhood of x. The unique 
solution is a trajectory that will converge to an attractor asymptotically and reach it only 
at /— »oo. 

1.9 Probability 
Definition: 

Q : Sample space; contains all possible outcomes of an experiment. Q can be discrete 

or continuous. 

0) : A single outcome; a eQ. 

A: Specific event of interest or set of outcomes A c Q . An event A is said to occur if 

the observed outcome 0 is an element of A, that is , we A. Associated that with these 
definitions is a probability space. 

Forexample, A, c Aj =>?(a,)<p(A2) 
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where P(A) is a probability function that assigns a real, scaler-valued function to each set 
A, with the constraints: 

1. /’(a)> 0 V AeQ 

2. ?{£^)=1 

3. If sets or experiments outcomes Ai,A 2 A„ are mutually 

exclusive, that is, A/flAy =^, \li,j,then -Pf [ J . ^ j ^ • 

Given the set of all outcomes A;, r = 1,2,3, 4 n ,which constitute a partition 

ofD, Yu^{Ai)=\ 

i 

4. /’W=0 
1.10 Random Variables 

Assume we are conducting an experiment tihat involves asking 10 people whether they 
like ice cream or not. For each of the surveyed people we record ’1’ if they say they like 
it, ’0’ if they say they don’t. The sample space for this experiment is the space of all 

binary strings of length 10, i.e., it has 2'° elements. It follows that there are 2^"’ possible 
events in this sample space. If we are only interested in how many said that they liked ice 
cream, then we can reduce the sample space to the numbers 0 through 10 which is easier 
to deal with than the original space. Note that we really defined a function from the 
sample space into the numbers 0, 1, . . .,10 by counting the number of ’F’s in the binary 
string encoding of the answers. In general, we define a random variable to be a function 
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from a sample space into the real numbers. In most experiments, random variables are 
used implicitly as in “sum of numbers” in a toss of two dice, “number of heads” in 25 
coin tosses, etc. Note that in defining a random variable we have defined a new sample 
space as the range of the variable, {0, 1, . . . , 10} in our ice cream experiment. Given a 
sample space with a probability function P and we define a random variable X , we can 
define a probability function for the range of X using P the following way. Let X 
be the range of X , and let xe X . We observe W = x if and only if the outcome of the 
experiment is an 5' s 5 such that W(i') = x . Hence, 

P^(X = x)=P({s€5|X(5)=x}) (1.54) 

If we concern with event A^X ,'we define, 

P^ (a) = P({5 e S 1 X{s) e A}) (1 .55) 

We often write P{X e A) for Px{a) and P{X = x) for P^iX = x) if no confusion can 
arise. 

We can associate several functions with a random variable. The first is the cumulative 
distribution function or CDF, defined by, 

F^(x)=P^(X<x), forall X (1.56) 

A random variable is discrete if Px(x) is a step function of x , and is continuous if Fj/ (x) 
is continuous. The probability mass function or PMF of a discrete random variable is 
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fx (x) = Px{X = x), for all X (1 .57) 

Similarly, the probability density function of a continuous random variable is defined as 
the function that satisfies, 

F^{x)= r for all x (1.58) 

J-oo 

1.11 Probability Distribution and Densities 
The Probability Distribution 

The probability distribution /(x) of a discrete random variable is defined as, 

P(X = x) = f(f) ( 1 . 59 ) 

The Cumulative Distribution Function (CDF) 

The CDF of a discrete random variable is, 

^■,(^0)= E/M 

U<Xo 

The CDF /(x) of a random vector x = (x, , Xj , , x„ / at point x = Xq , 

^x(^o) = 'P(^^^o) (1-61) 

The Multivariate Probability Density Function (PDF) 

The PDF Pjj(x) of a continuous random vector X is defined as, 
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Hence, 

(1-63) 

For discrete random variable the corresponding formula is, 

where the sum is taken over all u’s, for which u < Xq . 

The Joint Distribution Function 

Thejoint distribution function of vectors x and y is given by, 

where Xo,>’o are vectors of dimensions of x and y, respectively. Thus, the joint 
distribution function calculates the probability of the event x < Xp and y^yo- 

The joint density function is defined analogously to previous definitions by differentiation 
of the joint probability distribution w.r.t. all components. It follows that the probability of 
an event (x < Xo,y < jo) is, 

P(x<Xo,y<yo)= £ ^p^y{^,Tj)dr]d^ (1.66) 

1.12 Gaussian Distribution 

The Gaussian distribution, known as the normal distribution, is the most common 
assumption for error distribution. The PDF of the normal distribution is defined as 
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, 

p{x) = — J==e (1-67) 

crv2;r 

for xeR , where p is the mean and o' > 0 is the standard deviation. For the Gaussian 
distribution, 99.73% of the data are within the range of [yu - 3 (t, fx+3a]. The Gaussian 
distribution has its first-order moment as p, second order moment as (P‘, and higher-order 
moments as zero. If p — Q and a = 1, the distribution is called the standard normal 
distribution. The PDF is also known as the likelihood function. An ML estimator is a set 
of values (u, a) that maximizes the likelihood function for a fixed value of x. 

The cumulative distribution function (CDF) is defined as the probability that a random 
variable is less than or equal to a value x, that is 

f(x)= r p{t)dt (1.68) 

J-oo 


The standard normal CDF, conventionally denoted 0, is given by setting // = 0 and u = 1. 
The standard normal CDF is usually expressed by 
1 


4){x) = : 


1 + erf] 




(1.69) 


where the error function erf (x) is a non elementary function, which is defined by 

erf{x) = P^ 
dn: 

When vector xeR” , the PDF of the normal distribution is then defined by 


(1.70) 


p{x)-- 


1 


(2;r)i|S 


(1.71) 


where p and E are the mean vector and the covariance matrix, respectively. 
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1.13 Cauchy Distribution 

The Cauchy distribution, also known as the Cauchy-Lorentzian distribution, is another 
popular data-distribution model. The density of the Cauchy distribution is defined as 


p{x)=- 


7ta\ 


1 + 


x-p 


(1.72) 


for xeR , where p specifies the location of the peak and o is the scale parameter that 
specifies the half- width at the half-maximum. When = 0 and cr = 1, the distribution is 
called the standard Cauchy distribution. 

Accordingly, the CDF of the Cauchy distribution is calculated by 
F(;c) = — arctanj^^ — 

None of the moments is defined for the Cauchy distribution. The median of the 
distribution is equal to p. The Cauchy distribution has a longer tail than the Gaussian 
distribution, and this makes it more valuable in stochastic search algorithms by searching 
larger subspaces in the data space. 

1.14 Markov Processes, Markov Chains, and Markov-chain Analysis 
Given a stochastic process {A(r): tel), where t is time, ^(r) is a state in the state space S. 
A Markov process is defined as a stochastic process that satisfies the relation 
characterized by the conditional distribution 

p[z(to +/i)< x|x(to) = Xo,X(t)= x^,-oo < r < to] 

= p[x(to+t,)<x|x(to) = Xo] 
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for any value of to and for t\ > 0. The future distribution of the process is determined by 
the present value of X(to) only. 

When T and 5 are discrete, a Markov process is called a Markov chain. Conventionally, 
time is indexed using integers, and a Markov chain is a set of random variables that 
satisfy 




P[^n =X„\X„_x =^«-l] 


(1.75) 


This definition can be extended for multistep Markov chains, where a chain state has 
conditional dependency on only a finite number of its previous states. 

For a Markov chain, p[x„ = =/] is the transition probability of state z to j at time 

n-/.If 

P[Xn=j\Xn-x=t]=p[x„,^=j\X„,„_,=i] (1.76) 

for m > 0 and i, jeS , the chain is said to be time homogeneous. In this case, one can 
denote 

Pij=p[x„^j\x,_,=i] (1.77) 

and the transition probabilities can be represented by a matrix, called the transition 
matrix, P = [Py ], where i, y = 0, 1, * * ■ . For finite S, P has a finite dimension. 

In the Markov-chain analysis, the transition probability after k step transitions is P*. The 
stationary distribution or steady-state distribution is a vector that satisfies 

PV* = ;r* (1-78) 
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That is, 7c* is the left eigenvector of P corresponding to the eigenvalue 1. If P is 

irreducible and aperiodic, that is, every state is accessible from every other state and in 
the process none of the states repeats itself periodically, then P* converges element wise 

to a matrix each row of which is the unique stationary distribution n*, with 

= M (1.79) 

k->m ' 

Many modeling applications are Markovian, and the Markov-chain analysis is widely 
used for convergence analysis for algorithms. 

1.15 Bayes Decision Rule for Minimum Error 

Consider C classes, with a priori probabilities (the probabilities of 

each class occurring) p{co^\p{ 02 \ p{<oc)> assumed known. If we wish to minimize 

the probability of making an error and we have no information regarding an object other 
than the class probability distribution then we would assign an object to class coj if 

p[q)j)> p{0^) A = 1 C;k^J (1.80) 

This classifies all objects as belonging to one class. For classes with equal probabilities, 
patterns are assigned arbitrarily between those classes. 

However, we do have an observation vector or measurement vector x and we wish to 
assign x to one of the C classes. A decision rule based on probabilities is to assign x to 
class 0 j if the probability of class Wj given the observation x:,/7(a)y|jc), is greatest over all 

classes o, , ©2 > • That is, assign x to class if 

A = 1 .C;k^J (1.81) 
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This decision rule partitions the measurement space into C regions Q, Qc such 

that if X € Qy then x belongs to class (Oj . 

The a posteriori probabilities p[(Oj\x] may be expressed in terms of the a priori 
probabilities and the class-conditional density functions p{x\^a)j ) using Bayes’ theorem as 

( (1,82) 


and so the decision rule (1 .8 1) may be written: assign x to coj if 
p{^\aj )p(wj ) > p{x\0^ )p(0i ) ic = I, .C; ic^J 

This is known as Bayes’ rule for minimum error. 

For two classes, the decision rule (1.82) may be written 
pUai) pIw.) - 

= — f-j — T > -- T-^ implies ;c e class 0 ^ 
p{xl02) p(0i) 


(1.83) 


(1.84) 


The fact that the decision rule (1.82) minimizes the error may be seen as follows. The 
probability of making an error, jOfbrror), may be expressed as 


pjerror) = p{prror\0^ )/?(&>, ) (1.85) 

/=i 

where p{prror\cOi ) is the probability of misclassifying patterns from class 0 , . This is 


given by 


p{error\ 0 i)= 


( 1 . 86 ) 


the integral of the class-conditional density function over c[Qi], the region of 
measurement space outside £1, (C is the complement operator), i.e. . ^ / 

' y'=l, ^ 
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Therefore, we may write the probability of misclassifying a pattern as 


c 

p{error) = ^ )dx 

c 

= l-^p(tOi)[ p{x\ayx 

i=i 


(1.87) 


from which we see that minimizing the probability of making an error is equivalent to 
maximizing 


( 1 . 88 ) 

the probability of correct classification. Therefore, we wish to choose the regions Q; so 
that the integral given in (1.86) is a maximum. This is achieved by selecting to be the 
region for which p(®, )p(x|<», ) is the largest over all classes and the probability of correct 
classification, c, is 




C = Jmax p{Q)f)p{^a}jpK 


(1.89) 


where the integral is over the whole of the measurement space, and the Baye’s error is 


eg = \~ Jmaxp(<yj )p{x\cOj )dx 


(1.90) 
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Chapter 2: Pattern Recognition: Scope and Methods 

Abstract 

Pattern recognition encompasses a wide range of information processing problem of great 
practical significance, form speech recognition and the classification of hand written 
characters, to fault detection in machinery and medical diagnosis. It has been well 
understood that these problem can perform well and effortlessly by human brain. 
However, their solution using computer has, in many case proved to be immensely 
difficult. In order to have the best opportunity of developing effective solutions, it is 
important to consider the various existing approaches and methods for pattern recognition 
by the machine. In this chapter we are discussing the different approaches of pattern 
recognition starting from conventional statistical inference to modem approaches of soft 
computing. 

2.1 Introduction 

Recognition is a basic property of all human beings; when a person sees an object, he or 
she first gathers all information about the object and compares its properties and 
behaviors with the existing knowledge stored in the mind. If we find a proper match, we 
recognize it [1]. The concept of recognition is simple in the real world environment, but 
in the world of computer science, recognizing any object is an amazing feat. The 
functionality of the human brain is amazing; it is not comparable with any machines or 
software. The act of recognition can be divided into two broad categories: 
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1. Concrete item recognition, it involves the recognition of spatial samples such as 
fingerprints, weather maps, pictures and physical objects and the recognition of 
temporal samples such as, waveforms and signatures. 

2. Abstract item recognition, it involves the recognition of a solution to a problem, 
an old conversation or argument. 

Pattern recognition, as a subject, spans a number of scientific disciplines, uniting them in 
search for a solution to the common problem of recognizing the pattern of a given class 
and assigning the name of identified class. 

But what is a pattern? Watanabe [2] defines a pattern as opposite of a chaos; it is an 
entity, vaguely defined, that could be given a name. Pattern could be a fingerprint image, 
a handwritten cursive word, a human face, or a speech signal. 

For example, some pattern recognition system applications include the following. In 
weather prediction, input data are in the form of weather maps, and the output is a 
forecast. The symptoms serve as input data in medical diagnosis while disease identity 
serves as the output. The predicted market ups and downs are the desired output for stock 
market prediction when input data are the financial news and charts. 

Pattern recognition is the categorization of input data into identifiable classes through the 
extraction of significant attributes of the data from irrelevant background detail. A pattern 
class is a category determined by some common attributes. Therefore, a pattern is the 
description of a category member representing a pattern class. A pattern class is a family 
of patterns that shares some common properties. Pattern recognition by machine involves 
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techniques for assigning patterns to their classes automatically with as little human 
interventions is possible. Pattern recognition aims to classify data (patterns) based on 
either a priori knowledge or on statistical information extracted from the patterns. The 
patterns to be classified are usually groups of measurements or observations, defining 
points in an appropriate multidimensional space [3]. 

A complete pattern recognition system consists of a sensor that gathers the observations 
to be classified or described; a feature extraction mechanism that computes numeric or 
symbolic information from the observations; and a classification or description scheme 
that does the actual job of classifying or describing observations, relying on the extracted 
features. 

Pattern recognition problems may be logically divided into two broad categories. First 
involves the study of pattern recognition capabilities of human beings and other living 
organism, while the second involves the development of theory and techniques for the 
design of device capable of performing a given recognition task for a specific 
applications. 

The first area deals with the subjects of psychology, physiology and biology, but the 
second area deals with the computer, information science and artificial intelligence. The 
pattern recognition is concerned primarily with description and analysis or classification 
of measurements taken from physical or mental process [4]. The area of pattern 
recognition deals with the following: 
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i. Character Recognition: Optical or hand written character recognition, 

ii. Visual Image Recognition, 

iii . Voice Data Recognition, 

iv. Speech Analysis, 

V. Man and Machine Diagnostics, 

vi. Person Identification and Industrial Inspection. 

During the last few years the researchers have proposed many mathematical approaches 
to solve the pattern recognition problems. The available methods of pattern recognition 
may be categorized into three basic principals: 

i. Statistical Methods; consisting the sub disciplines like discriminant analysis, 
feature extraction, error estimation, cluster analysis 

ii. Structural Methods consisting grammatical inference and parsing 

iii. Artificial Intelligence based Methods 

2.2 Statistical Methods 

An approach to machine intelligence which is based on statistical modeling of data. In a 
statistical model, one applies probability theory and decision theory to get an algorithm. 
During the past there has been progress in theory and applications of 'Statistical 
Pattern Recognition' [5-9]. The three major issues encountered in the design of a 
statistical pattern recognition system are sensing, feature extraction, and 
classification. The primary issue is the representation of the input data which can be 
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measured from the objects to be recognized and it is called sensing problem. Each 
measured quantity describes the characteristics of the pattern or objects. 

The number of features of the pattern samples is usually very large. The features of 
the pattern samples are reduced by considering their salient characteristics. This 
process is referred to as feature extraction. Several approaches for feature extraction 
have been proposed by the various researchers such as feature extraction by moments 
invariants [10], feature extraction by autoregressive models [11], and feature 
extraction by KL transformation [12]. 

The last and the third issue of the statistical pattern recognition is the pattern 
classification or development of the classifier. The pattern classifier is defined as a 
device or a process that sorts the given data into identifiable categories and classes. 
The pattern classification is an information transformation process, i.e., the classifiers 
transforms relatively large set of mysterious data into a smaller set of useful data [9]. 
Trainable classifier is one that can improve its performance in response to the 
information it receives as a function of time. Training is a process by which the 
parameters of the classifiers are adjusted. The classifier is trained using the reduced 
pattern samples. It is often assumed that the pattern samples of a given class occupy a 
finite region in a pattern space and it is called a class region. 

A good classification is the main object of recognition system. Generally, the statistically 
pattern recognition problems fall into two main categories: supervised classification 
(discriminant analysis) problems and unsupervised classification (clustering) problems. 
When the samples have known classification (labeled samples) then the classification is 
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called the supervised otherwise it is unsupervised [14,15]. For the supervised 
classification, the data is labeled; and the problem is to design a classifier to predict the 
class label for any given sample. For the unsupervised classification, data are not labeled 
and to classify the data into groups with features that distinguish one group from another. 

In the supervised classification, each sample is assumed to come from one of the c 
possible classes ,/ = 1,2, ,c. Suppose, we are given n training samples 

z = l,2, n, where x,. represents the data and y,. 6{l,2, ,c} the corresponding 

class labels. The objective is to design a decision function g(x) from these training 
samples such that g(x) can accurately predict the class label for any given test sample x . 

A classical way to tackle this problem is to use Bayes’ decision rule [16], which assigns a 
test sample x to the class with the maximum posterior probability /?(Qj |x), that is, 


xeQ^,z/,p(Q^|x) 


max plQ.lx) 
7 = 1,2 ’ 


( 2 . 1 ) 


where /7 (q^ |x) is defined as the conditional probability that the observed object belongs 
to class given the associated sample data, x . In practice these posterior probabilities 
are usually unknown. By applying Bayes’ theorem, they may be calculated as. 



where /?(x) is the probability density that the sample x occurs, p{Qj ) is the prior 
probability that class Qj occurs and jt?(x[Qy) is the class-conditional density of a sample 
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X occurring given that the sample is known to come from class . Then the decision 
rule becomes, 

xeQ., if ;?(x|Q,)p(Qj> );?(□, )iVy = 1,2, ,c (2.3) 

Generally, the prior probability p{p.j) and the class-conditional densities /?(;c|qJ are 
unknown and need to be estimated. Compared with the estimation of the prior probability 
p^j ), the estimation of the class-conditional densities ) is much more difficult. 

If we know the functional form of the class-conditional densities jp(x|qJ (e.g., 

multivariate Gaussian), but do not know the parameters (e.g., mean and covariance), we 
face a parametric decision problem. In this case, a common strategy is to replace the 
unknown parameters in the density functions with some estimated values, resulting in the 
so-called Bayes plug-in classifier. If the form of the class-conditional densities is not 
known, then we operate in a non-parametric mode. In this case, we may either directly 
estimate the density for a given sample (e.g., Parzen windows and k-Nearest Neighbor) or 
construct a decision boundary based on the training samples. In the latter case, assuming 
that the decision boimdary is linear leads to a linear discriminant method (e.g., 
perceptrons and Fisher’s rule). Assuming that the decision boundary is not linear leads to 
a nonlinear discriminant method (e.g., feedforward neural networks and support vector 
machines). These basic approaches can be organized into the tree structure as shown in 
Figure 2.1. 
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Figure 2.1: Various Approaches in Supervised Classification 
2.2.1 Parametric Methods 

In the parametric approach the underlying densities of the pattern classes are known but 
the values of the parameters might be unknown. If the parameters were known then the 
discriminant functions based on them can be readily specified. In practice when large 
number of pattern samples are available, class density function can be estimated or 
learned from the samples [4]. These parameters are then used for specification of 
discriminant functions. An important situation in which pattern classes are characterized 
by set of parameters occur when the patterns in each of the N classes are random 
variables governed by N distinct probability functions. The most widely accepted 
parametric type classifier in pattern recognition is Bayes classifier. This classifier assigns 
zero loss to correct classifications and equal loss to incorrect classifications. The optimal 
decision of the Bayes classifier minimizes the probability of error in classification. For a 
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two-class problem the Bayes discriminator gives an optimum decision by assigning the 
pattern sample 

If the functional form of the class-conditional densities is known, but the 

parameters of the density functions are unknown, we may estimate the unknown 
parameters from the available samples. Without loss of generality, we may drop the class 

label and write as p{x\6), where 0 is a parameter vector to be estimated. The 

commonly used parametric models are the multivariate Gaussian distribution for 
continuous variables, the binomial distribution for binary variables, and the multi normal 
distribution for integer-valued (or categorical) variables. 

Applying the most widely used estimator, namely the maximum likelihood estimator, we 
select as to estimate the particular value of which gives the greatest value of the 
“likelihood”, which is defined by, 

L(x,. (2.4) 

/=1 

An important result of using this method is that, when p{x;0) is a multivariate Gaussian 
distribution, if the covariance matrices for different classes are assumed to be different, 
the optimal classifier provides a quadratic decision boundary. More interestingly, if the 
covariance matrices are assumed to be identical, then the optimal classifier has a linear 
decision boundary. 

In general, the parametric approach works well when the number of features (i.e. 
dimension ofx) is small. Unfortunately, the parametric approach suffers from a 
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phenomenon known as “Curse of Dimensionality”, which states that the number of 
required samples is an exponential function of the feature dimension. To address this 
problem, various regularization techniques [17] were proposed to obtain a robust estimate 
for situations with small sample size and/or high-dimension. Among them, the most 
famous one is the Regularized Discriminant Analysis [18] proposed by Friedman. 

In many real world problems, data may form subgroups in each class. In this case, we 
may approximate the density p{x;0) by a finite mixture model with, 

^(x; 0)='^ ^jp(x; rjj ) (2.5) 

>1 

where g is the number of mixture components, Kj is a mixing proportion 
“O’ a component density function which depends on 

parameters rjj. There are three sets of parameters to be estimated: 7t^, and g. 
Generally, the value of g is the most difficult to estimate. Wolfe [19] proposed a 
modified likelihood ratio test to estimate the number of components. This approach was 
later investigated by Everitt [20] and Anderson [21]. Other approaches include Bozdogan 
[22] and Celeux and Soromenho [23]. Their approaches are based on the information- 
theoretic criteria such as entropy. If the value of g is already known, the values of Ttj 

and rjj can be estimated by using the Expectation Maximization (EM) method [24], 

which is an extension of the maximum likelihood estimator for the uni-modal case (each 
class has only one group). Although the EM method is very easy to implement, its 
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convergence rate can be poor, depending on the data distribution and the initial 
parameters. Jamshidian and Jennrich [25] proposed a generalized conjugate gradient 
method to speed up the EM method. Lindsay and Basak [26] described a method to 
initialize the EM model. In addition to EM, several authors also advocated the Markov 
Chain Monte-Carlo method [27], but this method is computationally demanding. 

2.2.2 Nonparametric Methods 

It is often the case in pattern recognition that assuming class conditional density functions 
to be members of a certain parametric family is not reasonable. The parameter estimation 
approach of the previous section is then not possible. Instead, we must estimate the class 
conditional density functions non-parametrically. In non-parametric estimation (or 
density estimation), we try to estimate p(x|Qi) in each point x whereas in parametric 

estimation we tried to estimate some unknown parameter vector. It is also possible to 
estimate directly the posterior probabilities j7(£2ijx) assuming that the training data was 

collected using mixed sampling. For non-parametric estimation we have two different 
well known methods: Parzen windows [28] and k-nearest neighbor’s method [29, 30]. 

Both Parzen windows and k-NN methods are based on one simple idea - the class- 
conditional density ) may be approximated by the probability of training samples 

of class j falling into a small area around x divided by the volume of that area (for 
example, if we define the small area as the d-dimensional hypercube whose edges are A 
units long, the volume of that area is F = Specifically, the estimated class- 
conditional density ) is. 
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( 2 . 6 ) 


where V is the volume of the small area around x, kj is the number of training samples 
of class j falling in the small area, and rij is the number of training samples of class j . 
It can be shown that if we have injfinite number of training samples of class j , as the 
volume of the small area approaches zero, the density estimate p(x|q^ ) in equation (2.6) 

asymptotically converges to the true density /?(x|nj). 

Assume that the region D is a d-dimensional hypercube. If h„ is the length of the side of 
the hypercube, its volume is given by V„=h^ . The window function that receives value 1 
inside the hypercube centered at the origin, and value 0 outside the hypercube is defined 
as: 


k{z) = 


[l z,<0.5,f = l, ,d 

1 0 otherwise 


(2.7) 


7 


where ze R . Then, is 1 , if falls in the hypercube D centered at x , and 0 


otherwise. Thus, the number of training samples of class j falling into the hypercube D 


is. 


kj='£k 


/ \ 




( 2 . 8 ) 


Substituting equation (2.8) into equation (2.6), we obtain the density estimate. 




(2.9) 


44 


Chapter 2: Pattern Recognition: Scope and Methods 


It is to be noted that in contrast to the general density functions, because of the Parzen 
window function (2.7), this estimation is not continuous. To resolve this problem, 
continuous window functions are usually used. 


There is more general approach to obtain the density estimation is available if some other 
k functions is considered than above. The Paxzen-window density estimate at x using n 
training samples and the window function k is defined by 








( 2 . 10 ) 


The estimate ^(xQ^) is an average of values of the window function at different points. 


Tj^ically the window function has its maximum at the origin and its values become 
smaller when we move further away from the origin. Then each training sample is 
contributing to the estimate in accordance with its distance from x. 

The important issue with the Parzen windows method is the selection of the window 
length . This is not easy task, the choice often is application specific and it is rooted in 
the properties of the density estimate required by the application. If h„ is too large, then 
the density estimate p(x|Qj) will be very smooth and ‘out-of-focus’. If h„ is too small, 

then the estimate ) will be just superposition of n sharp pulses centered at training 

samples, i.e. an erratic and noisy estimate of the true density, will not reflect the 
curvature of the true density. A general rule is that h should be small in high density 
areas and be large in low density areas. 
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Parzen estimates have some interesting properties, for p(xjQ J to be a legitimate density, 

it is required that 

i. )> 0 for all X and 

ii. J p(x:|Qy)ic = l , If we maintain the relation =V„. 

With an unlimited number of training samples, it is possible to let approach 0, and 
have ) to converge to ). For this, it is required that: 

i. The density function must be continuous. 

ii. The window function must be bounded and a legitimate density, 

hi. The values of the window function must be negligible at infinity, 
iv. 0 when n-^co 

V. nF„ ->■ 00 when 00 

For pattern recognition, these optimality convergence properties are important, because 
with these properties we are able to show that the classification error of a classifier based 
on Parzen windows tends to the Bayes error when n approaches infinity. This holds also 
more generally: When the density estimates converge (in a certain sense) to the true class 
conditional density functions, and the prior probabilities are properly selected, then the 
error of the resulting classifier tends to the Bayes error. 

In the Parzen windows method, the design of the classifier involved selecting window 
functions and suitable length of the window k„ . One possibility is let them depend on 
the training data. With respect to the general formulation of the density estimation, fixing 
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k„ and computing of suitable (small enough) V„ based on the selected A:„. To be more 
precise, center of the cell p„ into the test point x and let the cell grow until it encircles 
k„ training samples to estimate • These training samples are k„ nearest 

neighbors of x. Here, k„ is a given parameter. The k„ nearest neighbor (KNN) density 
estimate is given by 



( 2 . 12 ) 


where V„ is the volume of the smallest possible x centered ball that contains k„ training 
samples, and n is the total number of training samples. If the true density is high in the 
proximity of x, the cell /?„ is small and the density estimate is accurate. If the true density 
is low in the proximity of x, the cell /?„ is large and the density estimate is not that 
accurate. 

Let p(x) be continuous at x. The KNN density estimate converges if 

i. A:„ ->■ 00 when « 00 

ii. -->0 when «->oo 

n 

One of the properties of the KNN density estimate is that although it is continuous, its 
partial derivatives are necessarily not continuous. 

Two big issues with the k-NN method are the choice of k„ and the choice of the distance 
metric. Often in practice, k„ is chosen empirically and the Euclidean distance is used. But 
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as it turns out, other distance measures, which incorporate prior knowledge, can 
significantly enhance the classification performance [31]. 

One of the biggest disadvantages for both the Parzen windows and the k-NN methods is 
that all training samples need to be retained in order to classify any given data. For large 
data sets, this leads to huge memory requirement and slow computation speed. To resolve 
this problem, several data reduction techniques, such as editing [32] and condensing [33], 
have been proposed. A branch and bound method [34] was also proposed to speed up the 
k-NN method. For the Parzen windows method, Silverman [35] proposed to use a Fourier 
transform to speed up the computation of kernel functions for univariate density 
estimation. In the multivariate case, procedures for approximating the kernel density 
using a reduced number of kernels were described by Fukunaga and Hayes [36] and 
Babich and Camps [37]. 



2.2.3 Linear Discriminant Functions and Classifiers 

Parametric and non-parametric density estimation techniques find the decision 
boundaries by first estimating the probability distribution of the patterns belonging to 
each class. In the discriminant-based approach, the decision boundary is constructed 
explicitly. Knowledge of the form of the probability distribution is not required. 

A discriminant function is a function of the pattern x that leads to a classification rule. 
For example, in a two-class problem, a discriminant function h{x) is a function for which 


/i(x) > A => X e Q] 
< A => X e Q2 


( 2 . 11 ) 
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for constant k. In the case of equality h{x) =k , the pattern jc may be assigned arbitrarily to 
one of the two classes. An optimal discriminant function for the two-class case is 

( 2 . 12 ) 

with k = p{£ii)/ p{n 2 ) ■ Discriminant functions are not unique. If /is a monotonic function 
then 

g(^) = /(^ W) >h'=>xeQi 

g(x)= /(h(x))< h'=> x e Q2 


where ^’=/(;i:). 

In the C group case we define C discriminant functions g, (x) such that 

y = l .C 756 / (2.13) 

That is, a pattern is assigned to the class with the largest discriminant, and for two 
classes, a single discriminant function 

(2.14) 

with k =0 reduces to the two-class case given by ( 1 . 1 1 ) 

In linear discriminant analysis, for the two-class classification problem, we seek a weight 
vector we R‘^ and a bias b e R such that, 

(2.15) 

In the C -class case (c > 2) , a data point x will be assigned to the class k that satisfies, 
g^(x) = maxg.(x) (2.16) 

j 
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where gj{x) = w‘jX + bj, Wj e R‘^ and bj g R, j = 1,2, ,c. 

Compared with the Parzen windows and k-NN methods, this approach has the advantages 
of small memory requirement and fast prediction speed, since only a few parameters 
^j,bj ) need to be used in prediction. 

A common feature of various existing linear discriminant methods is that the parameters 
(the weights and the biases) are found by optimizing some criterion. An obvious criterion 
is to minimize the number of classification errors, which occur when using the training 
samples. Unfortunately, this criterion is difficult to handle because the number of errors 
is an integer. A simple extension is the perceptron criterion [38], which minimizes the 
total of the distances of the misclassified samples from the decision boundary. For the 
two-class classification problem, the perceptron criterion is, 

min J p{w,b)= (2.17) 

xgM 

where we R‘‘ is a weight vector, be R is a bias, M is the set of misclassified samples, 
and j-j is the absolute value. The Perceptron has a very simple error-correction algorithm 

[39] for training. This training procedure cycles through the training samples, modifying 
the weight vector whenever a sample is misclassified, that is, 

■W'(^4-l)= w(^)+77Xj (2.18) 

where x-is a training sample that has been misclassified by weights w{k),7] is the step 
size, and k is the iteration number. It can be proven that as long as the training samples 


50 



Chapter 2: Pattern Recognition: Scope and Methods 


are linearly separable (any pair of classes can be separated by a straight line), the error- 
correction algorithm converges to a solution in a finite number of iterations [39]. 
Nevertheless, even though the resulting straight line is able to perfectly separate training 
data, it cannot guarantee the quality of prediction on (additional) testing samples. 
Therefore, a separating line with a certain “margin” [40] relating to the boundary points 
was proposed to ease this problem (This “margin” idea was also used in developing 
support vector machines [41]). If the training samples are not linearly separable i.e. for 
hard problems, the error correction algorithm does not converge. Various methods [40] 
have been proposed to handle this problem, including “early stop”, relaxation, linear 
programming approaches. The Perceptron can be generalized to handle the c-class 
classification problem by using “Kesler’s Construction” [39]. 

The least squared error is another popular criterion. In this case, it is assumed that each 
sample has a target value . For example, =1 if e Q,, /, = 0 if x^ g Qj- The least 

squared error approach aims to minimize the total of the squared distances between the 
target values and the decision function outputs. For the two-class classification problem, 
the least squared error criterion is, 

minJ,„(w,h) = ^||f;-(w^x,-i-6y (2.19) 

i=I 

This criterion has a very nice property [3]. If we set the target value in such a way that 
= 1 if X, € Qj, tj =0 if Xj G Qj , as the number of training samples increases to infinity, 

the output value of the decision function, g(x) = y/x -i- b , ^ymptotically converges to the 
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difference between the posterior probabilities, /7 (Qj|x)- j3{Q2|x). Therefore, the 
approximate values of posterior probabilities can be computed from 
jp(Q,|x)«(l + g(x))/2 anif/?(Q2|x)w(l-g(x))/2. Unlike the perceptrons, the least 

squared error rule does not necessarily produce a separable solution, even when the 
classes are linearly separable. Modifications of the least squared error rule have been 
proposed (e.g., the Ho-Kashyap procedure [42]) to remedy this problem, but the optimal 
approximation to the posterior probability is no longer achieved. Another interesting 
property of the least squared error rule is that with a proper choice of the target value , 

the rule leads to the same result as Fisher’s rule [43], for which the basic idea is to find 
the direction along which two classes are best separated. Furthermore, the least squared 
error rule can be easily extended to the multi-class case. The least squared error rule is 
the most popular criterion used for feed-forward neural networks. 

For some difficult problems, a nonlinear decision boundary is required in order to obtain 
a good separation of different classes. One important generalization of the linear 
discriminant analysis is the generalized linear discriminant function. The basic idea is to 

nonlinearly map the data from the original < 5 ? -dimensional space to some d -dimensional 
space, then apply linear discrimination in the mapped space. For the c-class classification 
problem, the discriminant functions become, 

= ^; 2 . 20 ) 

, (=1 
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where e i? is a weight, bj e i? is a bias, and R‘^ is a nonlinear mapping 

that often called the “basis function”. Hopefully, the transformed samples ^(x) are more 
linearly separable in the mapped space than the samples x were in the original space. 
Specht [44] and Flasinski and Lewichi [45] proposed the use of polynomial functions as 
the basis functions. But the problem with polynomial functions is that the number of 
terms increases rapidly with the order of the polynomial. 

2.3 Structured Methods 

Structural pattern recognition [46,47,48], sometimes referred to as syntactic pattern 
recognition due to its origins in formal language theory, relies on syntactic grammars to 
discriminate among data from different groups based upon the morphological 
interrelationships or interconnections present within the data. Structural pattern 
recognition techniques are effective for the data which contain an inherent, identifiable 
organization such as image data and time series data. The usefulness of structural pattern 
recognition systems, however, is limited as a consequence of fundamental complications 
associated with the implementation of the description and classification tasks. In many 
recognition problems involving complex patterns, it is more appropriate to adopt a 
hierarchical perspective where a pattern is viewed as being composed of simple sub 
patterns which are themselves built from yet simpler sub patterns . The elementary sub 
patterns to be recognized are called primitives and the given complex pattern is 
represented in terms of the interrelationships between these primitives. In syntactic 
pattern recognition, a formal analogy is drawn between the structure of patterns and the 
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syntax of a language. The patterns are viewed as sentences belonging to a language, 
primitives are viewed as the alphabet of the language, and the sentences are generated 
according to a grammar. Thus, a large collection of complex patterns can be described by 
a small number of primitives and grammatical rules. The grammar for each pattern class 
must be inferred from the available training samples. Structural pattern recognition is 
intuitively appealing because, in addition to classification, this approach also provides a 
description of how the given pattern is constructed from the primitives. This paradigm 
has been used in situations where the patterns have a definite structure which can be 
captured in terms of a set of rules, textured images, and shape analysis of contours [48]. 
The implementation of a syntactic approach, however, leads to many difficulties which 
primarily have to do with the segmentation of noisy patterns and the inference of the 
grammar from training data. Fu [46] introduced the notion of attributed grammars which 
unifies syntactic and statistical pattern recognition. The syntactic approach may yield a 
combinatorial explosion of possibilities to be investigated, demanding large training sets 
and very large computational efforts [49]. 

Structural pattern recognition systems are difficult to apply to new domains because 
implementation of both the description and classification tasks requires domain 
knowledge. Knowledge acquisition techniques necessary to obtain domain knowledge 
from experts are tedious and often fail to produce a complete and accurate knowledge 
base. Consequently, applications of structural pattern recognition have been primarily 
restricted to domains in which the set of useful morphological features has been 
established in the literature and the syntactic grammars can be composed by hand To 
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overcome this limitation, a domain independent approach to structural pattern recognition 
is used that is capable of extracting morphological features and performing classification 
without relying on domain knowledge. A hybrid system that employs a statistical 
classification technique to perform discrimination based on structural features is a natural 
solution. While a statistical classifier is inherently domain independent, the domain 
knowledge necessary to support the description task can be eliminated with a set of 
generally useful morphological features. Such a set of morphological features is 
suggested as the foundation for the development of a suite of structure detectors to 
perform generalized feature extraction for structural pattern recognition in time series 
data. 

Identification problems involving timeseries (or waveform) data constitute a subset of 
pattern recognition applications that is of particular interest because of the large number 
of domains that involve such data [50]. Both statistical and structural approaches can be 
used for pattern recognition of timeseries data: standard statistical techniques have been 
established for discriminant analysis of timeseries data, and structural techniques have 
been shown to be effective in a variety of domains involving timeseries data. 

A domain independent structural pattern recognition system is one that is capable of 
acting as a “black box” to extract primitives and perform classification without the need 
for domain knowledge. Such a system would automatically describe and classify data, 
thereby eliminating the overhead associated "with traditional approaches to structural 
pattern recognition. A domain independent structural pattern recognition system for 
timeseries data must incorporate techniques for the description and classification tasks 
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that are not dependent on domain knowledge — i.e., generalized description and 
generalized classification. Since syntactic grammars are inherently tied to the domain and 
application, a sensible approach to generalized classification for timeseries data is a 
statistical classifier that performs discrimination based on structural features extracted 
from the data. Generalized description can be implemented using a foundation of 
generally useful morphological features that are effective regardless of the domain. 

The field of signal processing offers a suggestion for morphological features that can 
provide the foundation for generalized description of timeseries data. Six fundamental 
types of modulation commonly used in signal processing systems — constant, straight, 
exponential, sinusoidal, triangular, and rectangular — entailmorphologies deliberately 
introduced into a continuous medium with the intent of conveying information regardless 
of the domain or application [51,52,53,54]. Moreover, these six modulation types 
subsume the small set of domain independent morphological features commonly 
extracted by structural pattern recognition systems — straight lines, parabolas, and peaks. 
A suite of feature extractors which identify morphological features based on these six 
modulation types, therefore, would constitute a first pass at implementing generalized 
feature extraction to support domain independent structural pattern recognition in 
timeseries data. 

Structural approaches, while supported by psychological evidence which suggests that 
structure based description and classification parallels that of human perceptual and 
cognitive processes have not yet been developed to the fullest potential due to 
fundamental complications associated with implementing structural pattern recognition 
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systems. Shiavi and Bourne [55] summarize the problems of applying structural methods 
for pattern recognition within the context of analyzing biological waveforms: There are 
obvious problems with the use of [structural techniques]. First, rather deep knowledge 
about the problem is required in order to successfully identify features and write 
[grammar] rules. While it is conceptually interesting to consider the possibility of using 
some automated type of grammatical inference to produce the rules, in practice no 
technique of grammatical inference has proved robust enough to be used with real 
problems involving biological waveforms. Hence, the writing of rules is incumbent on 
the designer of the analysis system. Similarly, the selection of features must be 
accomplished essentially by hand since automated techniques usually caimot provide the 
guidance necessary to make a useful feature selection. Second, the control strategy of 
typical parsing systems is relatively trivial and cannot deal with very difficult problems. 
Typical parsing techniques consist of simple repeated application of a list of rules, which 
is often equivalent to forward chaining, an elementary concept in knowledge based rule 
systems. Formation of a robust control strategy for guiding syntactic parsing of strings 
appears somewhat problematic. There is no general solution for extracting structural 
features from data. Friedman [56] addresses the issue by saying, “The selection of 
primitives by which the patterns of interest are going to be described depends upon the 
type of data and the associated application.” Nadler [57] seems to support this position 
when he states, “...features are generally designed by hand, using the experience, 
intuition, and/or cleverness of the designer.” The lack of a general approach for 
extracting primitives puts designers of structural pattern recognition systems in an 
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awkward position: feature extractors are necessary to identify primitives in the data, and 
yet there is no established methodology for deciding which primitives to extract. The 
result is that feature extractors for structural pattern recognition systems are developed to 
extract either the simplest and most generic primitives possible, or the domain and 
application specific primitives that best support the subsequent classification task. 

2.4 Neural Network Based Methods 

The pattern recognition approaches discussed so far are based on direct computation 
through machines. Direct computations are based on statistical analysis techniques. The 
neural approach applies biological concepts to machines for pattern recognition. The 
outcome of this effort is invention of artificial neural networks. Neural networks can be 
viewed as massively parallel computing systems consisting of an extremely large number 
of simple processors with many interconnections. Neural network models attempt to use 
some organizational principles (such as learning, generalization, adaptivity, fault 
tolerance, distributed representation, and computation) in a network of weighted directed 
graphs in which the nodes are artificial neurons and directed edges (with weights) are 
connections between neuron outputs and neuron inputs. The main characteristics of 
neural networks are that they have the ability to learn complex nonlinear input-output 
relationships, use sequential training procedures, and adapt themselves to the data. The 
most commonly used family of neural networks for pattern classification tasks [ 58 ] is the 
feed-forward network, which includes multilayer perceptron and Radial-Basis Function 
(RBF) networks. These networks are organized into layers and have unidirectional 
connections between the layers. Another popular network is the Self-Organizing Map 
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(SOM), or Kohonen-Network [59], which is mainly used for data clustering and feature 
mapping. The learning process involves updating network architecture and connection 
weights so that a network can efficiently perform a specific classification/clustering task. 
The increasing popularity of neural network models to solve pattern recognition problems 
has been primarily due to their seemingly low dependence on domain-specific knowledge 
(relative to model-based and rule-based approaches) and due to the availability of 
efficient learning algorithms. Neural networks provide a new suite of nonlinear 
algorithms for feature extraction (using hidden layers) and classification (e.g., multilayer 
perceptrons). In addition, existing feature extraction and classification algorithms can 
also be mapped on neural network architectures for efficient (hardware) implementation. 
In spite of the seemingly different underlying principles, most of the well known neural 
network models are implicitly equivalent or similar to classical statistical pattern 
recognition methods. Ripley [60] and Anderson et al. [61] also discuss the relationship 
between neural networks and statistical pattern recognition. Most neural networks 
conceal the statistics from the user. Despite these similarities, neural networks do offer 
several advantages such as, unified approaches for feature extraction & classification and 
flexible procedures for finding good, moderately nonlinear solutions. It consists of 
massive simple processing units with a high degree of interconnection between each unit. 
The processing units work cooperatively with each other and achieve massive parallel 
distributed processing. The design and function of neural networks simulate some 
functionality of biological brains and neurons systems. The advantages of neural 
networks are their adaptive-learning, self-organization and fault-tolerance capabilities. 
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For these outstanding capabilities, neural networks are used for pattern recognition 
applications. The goal in pattern recognition is to use a set of example solutions to some 
problem to infer an underlying regularity which can subsequently be used to solve new 
instances of the problem. Examples include hand-written digit recognition, medical 
image screening and fingerprint identification. In the case of feed-forward networks, the 
set of example solutions (called a training set), comprises sets of input values together 
with corresponding sets of desired output values. The training set is used to determine an 
error function in terms of the discrepancy between the predictions of the network, for 
given inputs, and the desired values of the outputs given by the training set. A common 
example of an error function would be the squared difference between desired and actual 
output, summed over all outputs and summed over all patterns in the training set. The 
learning process then involves adjusting the values of the parameters to minimize the 
value of the error function. This kind of feedback would be used to reconstruct the input 
patterns and make them free from error; thus increasing the performance of the neural 
networks. These kinds of networks are called as auto associative neural networks. As the 
name implies, they use back-propagation algorithms. However, effective learning 
algorithms were only known for the case of networks in which at most one of the layers 
comprised adaptive interconnections. Such networks were known variously as 
perceptrons [38] and Adalines[62], and were seriously limited in their capabilities [63]. 
Research into artificial neural network was stimulated during the 1980s by the 
development of new algorithms capable of training networks with more than one layer of 
adaptive parameters [64]. 
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2.5 Fuzzy Based Methods 

Pattern recognition tasks ideally suit fuzzy theory, as patterns are very often inexact, 
ambiguous, or corrupted. In the Handwritten Characters Recognition case example, the 
patterns to be recognized are not well defined and are ambiguous in many cases. Fuzzy 
rules for handwritten character recognition are given in Yamakawa [65]. Pattern 
recognition and classification are usually considered as very similar tasks. Classes can be 
described by fuzzy classification rules. Fuzzy rules can be used for classification 
purposes when the objects to be classified are noisy, corrupted, blurry, etc. Fuzzy rules 
can cope with those ambiguities. Contradictory fiizzy rules can be accommodated in one 
system, the tradeoff being achieved through the inference mechanism. This characteristic 
of fuzzy systems is very important for their applications in solving classification 
problems. Classification problems, when data examples labeled with class labels are 
available, have been successfully solved by classic statistical methods, especially when 
the data set is tmambiguous and dense. If the data set is sparse in the problem state space, 
the problem is rather difficult. The confidence factors represent the percentage of 
instances of a given class which fall in a particular "patch" of the problem space: 

Rule 1 ; IF A1 is M and A2 is S, THEN Classl (CF = 1) 

Rule 2: IF A1 is L and A2 is S, THEN Classl (CF = 0.45) 

Rule 3: IF A1 is L and A2 is S, THEN Class2 (CF = 0.55) 

Rule 4: IF A1 is L and A2 is M, THEN Class2 (CF = 0.75) 

Rule 5: IF A1 is L and A2 is M, THEN Classl (CF = 0.25) 
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The above set of rules is ambiguous and contradictory. Rules 2 and 3 have the same 
antecedents, but different consequences, which is also true for rules 4 and 5. This 
situation is impossible for a symbolic production system to cope with. A fuzzy 
production system can infer a proper classification, as for every input data vector all rules 
may fire to some degree and all of them contribute to the final solution to different 
degrees. The points at the border between the two classes should be correctly classified 
by the fuzzy rules as they take support not only from one of the contradictory rules but 
from a rule that has lateral fuzzy labels with it. 

When the data examples are not labeled with the class or pattern labels, that is, when the 
classes they belong to are not known, then different clustering techniques can be applied 
to find the groups, the clusters, in which data examples are grouped. The clusters show 
the typical patterns, the similarity, and the ambiguity in the data set. In addition to the 
exact clustering, fuzzy clustering can be applied too. 

Fuzzy clustering is a procedure of clustering data into possibly overlapping clusters, such 
that each of the data exeimples may belong to each of the clusters to a certain degree. The 
procedure aims at finding the cluster centers Vj (i = 1, 2, . . .,c) and the cluster 
membership functions pi which define to what degree each of the n examples belong to 
the ith cluster. The number of clusters c is either defined a priori (supervised type of 
clustering) or chosen by the clustering procedure (unsupervised type of clustering). The 
result of a clustering procedure can be represented as a fuzzy relation pi,k, such that: 

' y 

■ /=! . , ■ , ; ■ ■ 
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for each k = 1, 2, . . .,n; (the total membership of an instance k to all the clusters equals 1) 

c 

>0 

i=l 

for each i = 1, 2, . . .,c (there are no empty clusters) . 

A widely used algorithm for fuzzy clustering is the C-means algorithm suggested by J. 
Bezdek [66,67]. Fuzzy clustering is an important data analysis technique. It helps to 
understand better the ambiguity in data. It can be used to direct the way other techniques 
for information processing are used afterward. For example, the structure of a neural 
network to be used for learning from a data set can be defined to a great extent after 
knowing the optimal number of fuzzy clusters. 

2.6 Handwritten Character Recognition 

For more than thirty years, researchers have been working on handwriting recognition. 
As in the case of speech processing, they aimed at designing systems able to understand 
personal encoding of natural language. This new stage in the evolution of handwriting 
processing results from a combination of several elements: improvements in recognition 
rates, the use of complex systems integrating several kinds of information, the choice of 
relevant application domains, and new technologies such as high quality high speed 
scanners and inexpensive powerful CPUs. Methods and recognition rates depend on the 
level of constraints on handwriting. The constraints are mainly characterized by the types 
of handwriting, the number of scriptors, the size of the vocabulary and the spatial layout 
Obviously, recognition becomes more difficult when the constraints decrease. 
Considering the types of roman script (roughly classified as hand printed, discrete script 
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and cursive script), the difficulty is lower for handwriting produced as a sequence of 
separate characters than for cursive script which has much in common with continuous 
speech recognition. For other writing systems, character recognition is hard to achieve, as 
in the case of Kanji which is characterized by complex shapes and a huge number of 
symbols. 

The characteristics which constrain hand writting may be combined in order to define 
handwriting categories for which the results of automatic processing are satisfactory. The 
trade-off between constraints and error rates give rise to applications in several domains. 

These new challenges bring the ongoing studies closer to unconstrained handwritten 
language processing which is the ultimate aim. The reading of all of the handwritten and 
printed information present on a document is necessary to process it automatically, to use 
content dependent criteria to store, access and transmit it and to check its content. 
Automatic handwritten language processing will also allow one to convert and to handle 
manuscripts produced over several centuries within a computer environment. 

Recognition strategies heavily depends on the nature of the data to be recognized. In the 
cursive case, the problem is made complex by the fact that the writing is fundamentally 
ambiguous as the letters in the word are generally linked together, poorly written and 
may even be missing. On the contrary, hand printed word recognition is more related to 
printed word recognition, the individual letters composing the word being usually much 
easier to isolate and to identify. As a consequence of this, methods working on a letter 
basis (i.e., based on character segmentation and recognition) are well suited to hand 
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printed word recognition while cursive scripts require more specific and/or sophisticated 
techniques. Inherent ambiguity must then be compensated by the use of contextual 
information. 

Intense activity was devoted to the character recognition problem during the seventies 
and the eighties and pretty good results have been achieved [68]. Character Recognition 
techniques can be classified according to two criteria; the way preprocessing is performed 
on the data and the type of the decision algorithm. Preprocessing techniques include 
three main categories: the use of global transforms (correlation, Fourier descriptors, etc.), 
local comparison (local densities, intersections with straight lines, variable masks, 
characteristic loci, etc.) and geometrical or topological characteristics (strokes, loops, 
openings, diacritical marks, skeleton, etc.). Depending on the type of preprocessing stage, 
various kinds of decision methods have been used such as: various statistical methods, 
neural networks, structural matching (on trees, chains, etc.) and stochastic processing 
(Markov chains, etc.). Many recent methods mix several techniques together in order to 
provide a better reliability to compensate the great variability of handwriting. 

Neural network computing has been expected to play a significant role in a computer- 
based system of recognizing handwritten characters. This is because a neural network can 
be trained quite readily to recognize several instances of a written letter or word, and can 
then be generalized to recognize other different instances of that same letter or word. This 
capability is vital to the realization of robust recognition of handwritten characters or 
scripts, since characters are rarely written twice in exactly the same form. 
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There have been reports of successful use of neural networks for the recognition of 
handwritten characters [69, 70], but we are not aware of any general investigation which 
might shed light on the systematic approach of a complete neural network system for the 
automatic recognition of cursive character. 

The following is a brief description of several recognition algorithms for the handwritten 
characters. 

Gader et al [71] reported a methodology using a pipeline strategy for handwritten 
numeral recognition which combines one eigenvalue filter, two stage template matchers 
(first stage has 754 templates and second stage has 657 templates) and 33 digit models. 
Correct rates of 94.03-96.39% and substitution error rates of 0.54- 1 .05% were obtained 
over the real-world databases. However, this methodology is not ideal for real-time 
processing. 

Le Cun et al [70] reported a neural network recognizer using a multilayer neural network. 
They use size normalized digit images as direct input to a neural network, with allowing 
the feature extraction and classification by multilayer neural network simultaneously. The 
neural network consists of four layer neurons such as 16x16 neurons in the input, 12x64 
neurons in the first hidden layer, 12x16 neurons in the second hidden layer, 30 neurons in 
the third hidden layer and 10 output layer neurons. So, in total, there are 1256 neurons 
and 64,660 connections. They got an 86% rate of correct classification rate and a 13% 
rejection rate to get 1% substitution error rate. 
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Recently, with the rapid development of computer and parallel processing technologies, 
scientists found that recognition performance can be improved significantly by 
combining several methodologies. For instances, Suen et al [72] combined four 
algorithms for recognizing unconstrained handwritten numerals. They found that the 
combined algorithms complement each other in many ways to reduce the substitution 
error rate while maintaining a fairly high correct classification rate. Therefore, they got a 
correct rate of 93.05% and a rejection rate of 6.95% while maintaining a zero error rate. 
Cohen et al and Hull et al [73] also use four algorithms in parallel. The combined 
algorithms are a polynomial discriminant method, a method that relies on statistical and 
structural analysis on the contours of digits, a structural classifier and a contour analysis 
method. 

Finally, we need to mention the current status of the recognition results. In the 
handwritten digit recognition, the best results have been reported with 85-95% correct 
classification rates and 1-5% substitution error rates [70,74,75,76]. 

2. 7 Conclusion 

The objective of this chapter was to introduce the various pattern recognition problems 
and its various existing solutions using neural network techniques and other well known 
techniques. Pattern recognition is a fast-moving and proliferating discipline. It is not easy 
to form a well-balanced and well-informed summary view of the newest developments in 
this field. It is still harder to have a vision of its future progress. In its early stage of 
development, statistical pattern recognition focused mainly on the core of the discipline: 
The Bayesian decision rule and its various derivatives (such as linear and quadratic 
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discriminant functions), density estimation, the curse of dimensionality problem, and 
error estimation. Due to the limited computing power available in the 1960s and 1970s, 
statistical pattern recognition employed relatively simple techniques which were applied 
to small-scale problems. Since the early 1980s, statistical pattern recognition has 
experienced a rapid growth. Its frontiers have been expanding in many directions 
simultaneously. This rapid expansion is largely driven by the following forces. 

i. Increasing interaction and collaboration among different disciplines, including 
neural networks, machine learning, statistics, mathematics, computer science, and 
biology. These multidisciplinary efforts have fostered new ideas, methodologies, 
and techniques which enrich the traditional statistical pattern recognition 
paradigm. 

ii. The prevalence of fast processors, the Internet, large and inexpensive memory and 
storage. The advanced computer technology has made it possible to implement 
complex learning, searching and optimization algorithms which was not feasible a 
few decades ago. It also allows us to tackle large-scale real world pattern 
recognition problems which may involve millions of samples in high dimensional 
spaces (thousands of features). 

iii. Emerging applications, such as data mining and document taxonomy creation and 
maintenance. These emerging applications have brought new challenges that 
foster a renewed interest in statistical pattern recognition research. 

iv. Last, but not the least, the need for a principled, rather than ad hoc approach for 
successfully solving pattern recognition problems in a predictable way. 


68 


Chapter 2: Pattern Recognition: Scope and Methods 


Structural pattern recognition can be a powerful analysis tool within domains 
where a description composed of morphological sub patterns and their interrelationships 
is paramount to accurate classification decisions. A structural pattern recognition system 
typically includes feature extractors to identify instances of morphological characteristics 
of the data which, in turn, are used as the basis for classification using syntactic 
grammars. The domain knowledge necessary to guide feature extractor and grammar 
development is gathered using knowledge acquisition techniques. However, such 
techniques are time consuming, inexact, and do not always produce a complete 
knowledge base of the domain. Consequently, structural approaches to pattern 
recognition are difficult to apply to unexplored or poorly understood domains, thus 
limiting them to domains where the feature types and the syntactic grammars have either 
become established in the literature or are obvious upon inspection of the data. 
Eliminating the effort necessary to implement feature extraction and classification for 
structural pattern recognition systems will widen the applicability of structural 
approaches to complex, poorly understood domains. This can be accomplished using 
domain independent techniques for feature extraction and classification. A domain 
independent structural pattern recognition system is one that is capable of extracting 
features and performing classification without the need for domain knowledge. Such a 
system can be implemented using a hybrid approach that incorporates structural features 
with statistical techniques for classification: the structural features retain the 
morphological information necessary for discrimination, while the statistical classifier 
avoids the need to develop syntactic grammars that are inherently domain and application 
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Specific. The solution to making feature extraction domain independent is to employ 
generalized feature extraction to identify instances of morphologies which have proven to 
be useful across domains. Structure detectors were implemented to approximate a 
timeseries data set with one of six morphologies — constant, straight, exponential, 
sinusoidal, triangular, and trapezoidal. A methodology for applying these structure 
detectors to a timeseries data set in a piecewise fashion was developed, producing either 
a homogeneous or heterogeneous sequence of structures that together best approximate 
the entire time series. The efficiency of these structure detectors to generate 
morphological features suitable for classification was assessed against three standard 
statistical techniques for feature extraction — ^the identity, Fourier, and wavelet 
transformations — ^using two databases having markedly different characteristics. The 
classification accuracies achieved when using the structure detectors were at least as good 
as (and often superior to) the classification accuracies achieved when using the statistical 
feature extractors. The ability of the structure detectors to generate morphological 
features that result in classification accuracies better than the baseline established by 
commonly used statistical techniques demonstrates that the morphologies identified by 
the suite of structure detectors constitute a useful set of structural feature types. 
Moreover, the classification accuracies achieved on the two disparate databases illustrate 
that the suite of structure detectors is capable of extracting features from data with 
various characteristics. Certainly it is possible to produce better classification accuracies 
with domain and application specific feature extractors developed with the assistance of a 
domain expert, but this is burdensome for well understood domains and impossible for 
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domains that are poorly understood. What this suite of structure detectors offers is a 
starting point for extracting features which have been shown to be generally effective for 
classification, providing a springboard for domain exploration and the subsequent 
refinement of these structure detectors with the goal of producing a structural pattern 
recognition system targeted to a particular domain and application. 

As discussed, the neural networks have the ability to learn complex nonlinear input- 
output relationships, use sequential training procedures, and adapt themselves to the data. 
Neural networks have also several advantages such as, unified approaches for feature 
extraction & classification and flexible procedures for finding good, moderately 
nonlinear solutions. The neural networks are now well established as an important 
technique for solving pattern recognition problems, and indeed there are already many 
commercial applications of feed-forward neural networks in routine use. 
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Chapter 3: Soft Computing: Techniques, Models and Applications 

Abstract 

Soft computing is an innovative approach for constructing computationally intelligent 
systems, which can exhibit the improved performance for the pattern recognition task. It 
is now realized that complex real world problems require intelligent system that combine 
knowledge, techniques and methodologies from various sources. These intelligent 
systems are supported to posses the human like expertise within a specific domain, adopt 
themselves and learn to do better in changing environments and explain how they make 
decision and take actions. In confronting the real world computing problems, it is 
frequently advantageous to use several computing technique synergistically rather than 
exclusively, resulting in construction of complementary hybrid intelligent systems. In this 
chapter, we are discussing the various models and techniques of soft computing for 
constructing the hybrid systems. 

3.1 Introduction 

Soft computing refers to a collection of computational techniques in computer science, 
artificial intelligence, machine learning and some engineering disciplines, which attempt 
to study, model, and analyze very complex phenomena: those for which more 
conventional methods have not yielded low cost, analytic, and complete solutions. Earlier 
computational approaches could model and precisely analyze only relatively simple 
systems. More complex systems arising in biology, medicine, the humanities. 
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management sciences, and similar fields often remained intractable to conventional 
mathematical and analytical methods. It should be pointed out that simplicity and 
complexity of systems are relative, and many conventional mathematical models have 
been both challenging and very productive. 

Soft computing, a concept introduced by L.A. Zadeh in the early 1990s, is an evolving 
collection of methodologies for the representation of the ambiguity in human thinking. 
The core methodologies of soft computing are fuzzy logic, neural networks, and 
evolutionary computation. Soft computing targets at exploiting the tolerance for 
imprecision and uncertainty, approximate reasoning; optimize solutions and partial truth 
in order to achieve tractability, robustness, and low-cost solutions. 

The study of neural networks has been started dates back to the 1940s. The neural 
networks are a well-established computational paradigm in the field of artificial 
intelligence (AI) [1 - 4]. The powerful penetration of neural networks is due to their 
strong learning and generalization capability. They are usually seen as a method for 
implementing complex nonlinear mappings (functions) using simple elementary imits 
that are connected together with weighted, adaptable connectionist. We concentrate on 
optimizing the coimection structure of the networks. 

The theory of fuzzy logic and fuzzy sets was introduced by L.A. Zadeh in 1965. Fuzzy 
logic provides a means for treating uncertainty and computing with words. This is 
especially useful to mimic human recognition, which skillfully copes with uncertainty. 
Fuzzy systems are conventionally created from explicit knowledge expressed in the form 
of fuzzy rules, which are designed based on experts’ experience. A fuzzy system can 
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explain its action by fuzzy rules. Fuzzy systems can also be used for function 
approximation. The synergy of fuzzy logic and neural networks generates neurofuzzy 
systems, which inherit the learning capability of neural networks and the knowledge- 
representation capability of fuzzy systems. 

Evolutionary algorithms (EAs) [5-15], inspired by the principles of biological evolution, 
are another paradigm in AI that has received much attention lately. Evolutionary 
computation is a computational method for obtaining the best possible solutions in a huge 
solution space based on Darwin’s survival of the fittest principle. Evolutionary 
algorithms are a class of robust adaptation and global optimization techniques for many 
hard problems. Among evolutionary algorithms, the genetic algorithm is the best knovra 
and most studied, while evolutionary strategy is more efficient for numerical 
optimization. More and more biologically or nature-inspired algorithms are emerging. 
Evolutionary computation has been applied for the optimization of the structure or 
parameters of neural networks, fuzzy systems, and neurofuzzy systems. The 
hybridization between neural network, fuzzy logic, and evolutionary computation 
provides a powerful means for solving the real world problems of computation and 
predictions. 

3.2 The Neural Networks 

The study of the neural networks is generated by the study of human brain, that how the 
brain works to make the decision, to store the patterns and how a brain learns for the 
recognition of objects? The study defines that a hvunan brain contains an average of 3 x 
10*® neurons [16] of various types, which are basically known as decision elements, with 
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each neuron connecting to up to 10“^ synapses [17]. Artificial Neural Networks are 
relatively crude electronic models based on the neural structure of the brain [18-20]. 
Neural networks are attractive since they consist of many neurons, each of the neurons 
processes information separately and simultaneously. All the neurons are connected by 
synapses with variable weights. Thus, neural networks are actually parallel distributed 
processing systems. 

During the decades of 1940s McCulloch and Pitts found that the neuron can be modeled 
as a simple threshold device to perform logic function [21]. After that during the late 
1940s, Hebb proposed the Hebbian rule [22] to describe how learning affects the 
synaptics between two neurons. In the late 1950s and early 1960s, Rosenblatt [23] 
proposed the perceptron model, and Widrow and Hoff [24] proposed the adaline 
(adaptive linear element) model, trained with a least mean squares (LMS) method. The 
various studies and development of neural networks have been proposed by the 
researchers [25-28]. The landmark of the field is the multilayer perceptron (MLP) model 
trained with the backpropagation (BP) learning algorithm published in 1986 by 
Rumelhart er al. [29]. Later it turned out that the BP algorithm had already been 
described in 1974 by Werbos [30] when he studied social problems. 

3.2.1 Neurons 

The human body is made up of a vast array of living cells. Certain cells are 
interconnected in a way that allows them to communicate pain, or to actuate fibers or 
tissues. Some cells control the opening and shutting of minuscule valves in the veins and 
arteries. Others tell the brain that they are experiencing cold, heat, or any number of 
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sensations. These specialized communication cells are called neurons. Figures. 1 shows 
the biological structure of neurons. 



Figure 3.1: A biological neuron. 

The biological neurons are equipped with long tentacle-like structures that stretch out 
from the cell body, permitting them to communicate with other neurons. The tentacles 
that take in signals from other cells and the environment itself are called dendrites, while 
the tentacles that carry signals from the neuron to other cells are called axons. The 
interaction of the cell body itself with the outside environment through its dendritic 
connections and the local conditions in the neuron itself cause the neuron to pump either 
sodium or potassium in and out, raising and lowering the neuron’s electrical potential. 
When the neuron’s electrical potential exceeds a threshold, the neuron fires, creating an 
action potential that flows down the axons to the synapses and other neurons. The action 
potential is created when the voltage across the cell membrane of the neuron becomes too 
large and the cell “fires,” creating a spike that travels down the axon to other neurons and 
cells. If the stimulus causing the buildup in voltage is low, then it takes a long time to 


85 


Chapter 3: Soft Computing: Techniques, Models & Applications 

cause the neuron to fire. If it is high, the neurons fire much faster. The neuron is the basic 
information processing unit of a NN consists of: 

i. A set of links, describing the neuron inputs, with weights W], W 2 , ..., Wm 

ii. An adder function, for computing the weighted sum of the inputs. 

iii. Activation function (squashing function) for limiting the amplitude of the neuron 
output. The activation function represents a linear or nonlinear mapping from the 
input to the output. 

The output of the neuron is caleulated by 

u = '^WiXi-6 (3T) 

(=1 

and the final output z is calculated as: 

z = /(«) (3.2) 

There are some functions that can be used as activation functions as follows: 

1. Hard limiter 

1 u>0 

/(«)= ( 3 - 3 ) 

-1 M -<0 

2. Piecewise linear function 
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3. Logistic Sigmoid function 

4. Hyperbolic tangent function 

/(m) = tanh(/?M) 
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Figure 3.2(c) 


Figure 3.2(d) 


Figure 3.3: Different type of activation functions, (a) Hard-limiter (b) Piecewise 
linear(c) Sigmoid (d) Hyperbolic tangent. 

All the above functions are monotonically increasing with the domain of output (-1, 1) 
or (0, 1). In practical implementations, all the neurons are typically assumed to have the 
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same activation functions. Depending on the type of processor, the calculation of 
nonlinear activation functions may consume considerable time [31]. 

The most commonly used transfer function is the sigmoid or logistic function, because it 
has nice mathematical properties such as monotonicity, continuity, and differentiability, 
which are very important when training a neural network with gradient descent. Initially, 
scientists studied the single neuron with a hard-limiter or step-transfer function. 
McCulloch and Pitts used an “all or nothing” process to describe neuron activity [21]. 
Rosenblatt [23] used a hard limiter as the transfer function and termed the hard-limiter 
neuron a perceptron because it could be taught to solve simple problems. The hard-limiter 
is an example of a linear equation solver with a simple line forming the decision 
boundary. 

The most common activation function is the logistic sigmoid function, which is given by 
the following equation: 

z{putput) = — ^ — (3-7) 

l + e‘" 

The value of u is replaced from the equation 3.1, we get: 

Z{OUtput) = - 3 ; ^ (3-8) 

\ + e ^ 

3.2.2 The Neural Network Architecture 

The connection weight matrix W = [wy ], where Wy denotes the connection weight from 
node i to node y, is used to describe the network architecture. When = 0, there is no 
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connection from node i to node j. By setting the connection weights between nodes as 
zero, one can realize different network topologies. Basically, all artificial neural networks 
have a similar structure or topology as shown in figure 3.3. In that, structure some of the 
neurons interfaces to the real world to receive its inputs. Other neurons provide the real 
world vwth the network's outputs. This output might be the particular character that the 
network thinks that it has scanned or the particular image it thinks is being viewed. All 
the rest of the neurons are hidden from view. 



Figure 3.3: A simple neural network diagram 


According to the architecture, neural networks can be grossly classified into feedforward 
neural networks (FNNs), recurrent neural networks (RNNs), and their combinations. 
Some popular network topologies include fully connected layered FNNs, RNNs, lattice 
networks, layered FNNs with lateral connections. The nonzero elements of W can be 
adapted by a learning algorithm. In an FNN, the connections between neurons are in a 
feedforward manner. The network is usually arranged in the form of layers. In layered 
FNNs, there is no connection between the neurons within each layer, and no feedback 
between layers. A fully connected layered FNN is a network such that every node in any 
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layer is connected to every node in its adjacent forward layer. When some of the 
connections are missing, it becomes a partially connected layered FNN. FNNs exhibit no 
dynamic properties and the networks are simply a nonlinear mapping. The popular MLP 
and RBFN are fully connected layered FNNs. In an RNN, there is at least one feedback 
connection that corresponds to an integration operation or unit delay. Thus, an RNN 
actually represents a nonlinear dynamic system. 

Most applications require networks that contain at least the three normal types of layers - 
input, hidden, and output. The layer of input neurons receives the data either from input 
files or directly from electronic sensors in real-time applications. The output layer sends 
information directly to the outside world, to a secondary computer process, or to other 
devices such as a mechanical control system. Between these two layers there can be 
many hidden layers. These internal layers contain many of the neurons in various 
interconnected structures. The inputs and outputs of each of these hidden neurons simply 
go to other neurons. In most networks, each neuron in a hidden layer receives the signals 
from all of the neurons in a layer above it, typically an input layer. After a neuron 
performs its function it passes its output to all of the neurons in the layer below it, 
providing a feed forward path to the output. The way that the neurons are connected to 
each other has a significant impact on the operation of the network. 

3.2.3 Training of ANN 

After finalizing the architecture of the neural network for a given application, the training 
or learning is required for getting the desired output from the network. Training or 
learning of a neural network is an optimization process that produces an output that is as 
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close as possible to the desired output by adjusting network parameters. This kind of 
parameter estimation is also called learning or training algorithm. Neural networks are 
usually trained by epoch. An epoch is a complete run when all the training examples are 
presented to the network and are processed using the learning algorithm only once. After 
learning, a neural network represents a complex relationship, and possesses the ability for 
generalization. When a new input is presented to the trained neural network, a reasonable 
output is produced. Learning methods are conventionally divided into supervised, 
unsupervised, reinforcement, and evolutionary learning. Supervised learning is widely 
used in pattern recognition, approximation, control, modeling and identification, signal 
processing, and optimization. Reinforcement learning is usually used in control. 
Unsupervised learning schemes are mainly used for pattern recognition, clustering, vector 
quantization, signal coding, and data analysis. Evolutionary computation is a class of 
optimization techniques, which can be used to search for the global minima/maxima of an 
objective function. Evolutionary learning is used for adjusting neural network 
architecture and parameters using an evolutionary algorithm (EA), and can also be used 
to optimize the control parameters in a supervised or unsupervised learning algorithm. 
Supervised learning is based on a direct comparison between the actual network output 
and the desired output. Network parameters (weights) are adjusted by a combination of 
the training pattern set and the corresponding errors between the desired output and the 
actual network response. The errors first calculated then propagated back through the 
system, causing the system to adjust the weights, which evolve the learning process. The 
pattern set, which enables the learning, is called the "training set." During the learning of 
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a network the same set of data is processed many times as the connection weights are 
ever refined. So supervised learning can be defined as a closed-loop feedback system, 
where the error is the feedback signal. The trained network is used to emulate the system. 
To control a learning process, a criterion is needed to decide the time for terminating the 
process. For supervised learning, an error measure, which shows the difference between 
the network output and the output from the training samples, is used to guide the learning 
process. The error measure is usually defined by the mean squared error and calculated 

1^2 

by the error function: (3.9) 

Where the N is the total number of patterns pair from a sample training set, Zp is the 
actual output and Zp is the output calculated by the network for pair of sample of 

training set. This function is also known as the objective function to optimize the 
network. The error E is calculated a new after each epoch. This process of network 
training is terminated when E is sufficiently small or a failure criterion is met. To 
minimize the error up to the non significant value, a gradient-descent procedure is usually 
applied. The LMS [24] and back propagation algorithms [29] are two early, but most 
popular, supervised learning algorithms. Both of them are derived using a gradient- 
descent procedure. When finally, the system has been correctly learned, and no further 
learning is needed, the weights can, if desired, be "frozen." In some systems, this 
finalized network is then turned into hardware so that it can be fast. Other systems don't 
lock themselves in but continue to learn while in production use. 
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Unsupervised learning involves no target values. It tries to auto associate information 
from the inputs to decide what features it will use to group the input data. Unsupervised 
learning is solely based on the correlations among the input data, and is used to find the 
significant patterns or features in the input data without any supervision. A criterion is 
needed to terminate the learning process. Without a termination criterion, a continuous 
learning process continues even when a pattern, which does not belong to the training 
patterns set, is presented to the network. The network is adapted according to a constantly 
changing environment. Hebbian learning [22], competitive learning [26], and Kohonen’s 
SOM [32,33] are the three mostly used unsupervised learning approaches. In general the 
unsupervised learning is slow to settle into stable conditions. In Hebbian learning [22], 
learning is a purely local phenomenon, involving only two neurons and a synapse. The 
synaptic weight change is proportional to the correlation between the pre and 
postsynaptic signals. The C-means algorithm is a popular competitive learning-based 
clustering method [34]. By using the correlation of the input vectors, the learning rule 
changes the network weights to group the input vectors into clusters. The Boltzmann 
machine [35] uses a kind of stochastic training technique known as SA [36], which can 
been treated as a special type of unsupervised learning based on the inherent property of a 
physical system. Tuevo Kohonen, an electrical engineer at the Helsinki University of 
Technology developed a self-organizing network [37], sometimes called an 
autoassociator, that learns without the benefit of knowing the right answer. It is an 
unusual looking network in that it contains one single layer with many connections. The 
weights for those connections have to be initialized and the inputs have to be normalized. 


93 


Chapter 3: Soft Computing: Techniques, Models & Applications 


The neurons are set up to compete in a winner-take-all fashion. The other most common 
algorithm of unsupervised learning is the Hopfield neural network model [38,39] of 
associative memory. The Hopfield neural network suggested by Hopfield, in which he 
used energy estimation function and relates the network to other physical systems. 
Hopfield network is fully interconnected network with symmetric weights, no self- 
feedback and asynchronous updation of the state of processing elements. 

Reinforcement learning [40] is a special case of supervised learning, where the exact 
desired output is unknown. It is based only on the information as to whether or not the 
actual output is close to the estimate. Explicit computation of derivatives is not required. 
This, however, presents a slower learning process. Reinforcement learning is a learning 
procedure that rewards the neural network for its good output result and punishes it for 
the bad output result. It is used in the case when the correct output for an input pattern is 
not available and there is need for developing a certain output. The evaluation of an 
output as good or bad depends on the specific problem and the environment. For a 
control system, if the controller still works properly after an input, the output is judged as 
good', otherwise, it is considered as bad. The evaluation of the output is binary, and is 
called external reinforcement. Thus, reinforcement learning is a kind of supervised 
learning with the external reinforcement as the error signal. Reinforcement learning can 
learn the system structure by trial-and-error, and is suitable for online learning [41-44]. 
Evolutionary learning approach is attractive since it can handle the global search 
problem better on a vast, complex, multimodal, and no differentiable surface. It is not 
dependent on the gradient information of the error (or fitaess) fimction, and thus is 
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particularly appealing when this information is unavailable or very costly to obtain or 
estimate. Evolutionary Algorithms can be used to search for the optimal control 
parameters in supervised as well as imsupervised learning by optimizing their respective 
objective functions. It can also be used as an independent training method for network 
parameters by optimizing the error function. Evolutionary Algorithms are widely used for 
training neural networks and tuning fuzzy systems, and are generally much less sensitive 
to the initial conditions. They always search for a globally optimal solution, while 
supervised and unsupervised learning algorithms can only find a local optimum in a 
neighborhood of the initial solution [45]. 

3.2.4 Characteristics of ANN 

The overall functioning of neural networks is divided into two stages: learning (training) 
and generalization (recalling). Network training is done by giving the input samples, and 
network parameters are adapted using a learning method. This can be done in an online or 
offline manner. Once the network is trained to accomplish the desired performance, the 
learning process is terminated. For real-time applications, a neural network is required to 
have a constant processing delay regardless of the number of input nodes, and a 
minimum number of layers. As the number of input nodes increases, the size of the 
network layers should grow at the same rate without additional layers. The Artificial 
neural networks are characterized by the network architecture, node characteristics, and 
learning rules. Neural networks are usually biologically motivated. Each neuron is a 
computational node, which represents a nonlinear function. Neural networks possess the 
following advantages: 
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Neural networks have strong learning capability. They can adapt themselves by changing 
the network parameters in a surrounding environment. Powerful lea rning algori thm s 
make this capability possible. Learning and generalization are the most salient features of 
neural networks. 

A well-trained neural network has superior generalization capability. This can be 
attributed to the bounded and smooth nature of the hidden-unit responses. The bounded- 
unit response localizes the nonlinear effects of the individual hidden units in a neural 
network and allows for the approximations in different regions of the input space to be 
independently timed [42]. In contrast, in conventional curve-fitting methods, the 
polynomials and other functions have a potential divergence nature. 

Some neural networks such as the SOM [44] and competitive learning based neural 
networks have a self-organization property. The training of these networks is based on 
the unsupervised learning algorithms. 

Neural networks have robustness and fault-tolerant capability. A neural network can 
easily handle imprecise, fuzzy, noisy, and probabilistic information. It is a distributed 
information system, where information is stored in the whole network in a distributed 
manner by the network structure. Thus, the overall performance does not degrade 
significantly when the information at some node is lost or some connections in the 
network are damaged. The network will immediately improve the performance by 
updating the connection weights using the learning rule and the current result. Therefore, 
the network can repair itself, and possesses a strong fault-tolerant capability . Some neural 
networks such as the Hopfield network can be used as associative storage of information. 
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When a noisy or incomplete pattern is presented to a trained network, it will help to find 
the correct pattern; that is, the trained network is fault tolerant. 

The massive parallelism using simple uniform units can be readily implemented in analog 
VLSI or optical hardware [45,46], or be implemented on special-purpose massively 
parallel hardware [47]. 

3.3 The Backpropagation Learning Algorithm 

The backpropagation (BP) learning algorithm is currently the most popular supervised 
learning rule for performing pattern classification tasks [24,29,30]. It is not only used to 
train feed forward neural networks such as the multilayer perceptran, it has also been 
adapted to recurring neural networks [48]. The BP algorithm is a generalization of the 
delta rule, known as the least mean square algorithm [24]. Thus, it is also called the 
generalized delta rule. The BP overcomes the limitations of the perceptron learning 
enumerated by Minsky and Papert [49] .Due to the BP algorithm, the MLP can be 
extended to many layers. The BP algorithm propagates backward the error between the 
desired signal and the network output through the network. After providing an input 
pattern, the output of the network is then compared with a given target pattern and the 
error of each output vmit calculated. This error signal is propagated backward, and a 
closed-loop control system is thus established. The weights can be adjusted by a gradient- 
descent-based algorithm. In order to implement the BP algorithm, a continuous, 
nonlinear, monotonically increasing, differentiable activation function is required. The 
two most-used activation functions are the logistic function (3.5) and the hyperbolic 
tangent function (3.6), and both are sigmoid functions. 
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We want to train a multi-layer feed forward network by gradient descent to approximate 
an unknown function, based on some training data consisting of pairs (x ,z jeS. The 

vector X represents a pattern of input to the network, and the vector z the corresponding 
desired output from the training set S. The objective function for optimization is defined 
as the error MSB can be calculated by equation (3.9). 

All the network parameters and (9'”, m = 2 • • ’ M, can be combined and 


represented by the matrix ^ = The error function E can be minimized by applying 


the gradient-descent procedure as; 

dE 


A W = -7] 


dW 


(3.10) 


where 7 is a learning rate or step size, provided that it is a sufficiently small positive 
number. 

Applying the chain rule the equation (3.10) can express as 

dE 
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(3.11) 


(3.12) 


(3.13) 


For the output unit m=M-l 


dE 


(3.14) 
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For the hidden units, m = 1,2,5 ,M- 2, 

dE _ dE „+i 
do) ^ 

Define the delta function by 

M") 


(3.15) 


(3.16) 


fovm = m = 2,3 ,M. By substituting (3.11), (3.15), and (3.16) into (3.13), we finally 

obtain the following. 


For the output units, m = M- 1 , 




(3.17) 


For hidden units, m = 1, ,M - 2, 


( 3 . 18 ) 

Equations (3.17) and (3.18) provide a recursive method to solve for the whole 


network. Thus, W can be adjusted by 




For the activation functions, we have the following relations 
For the logistic function 

(!)(m)=/?(!S(m)[1-(^(m)] 


(3.19) 


(3.20) 


For the tank function 




(3.21) 
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The update for the biases can be in two ways. The biases in the (m+1)* layer can be 

expressed as the expansion of the weight that is, =Lii, 

Accordingly, the output o(m) is expanded into = (l,o|'"), Another way is 

' '' m ' 

to use a gradient-descent method with regard to by following the above procedure. 
Since the biases can be treated as special weights, these are usually omitted in practical 

applications. The algorithm is convergent in the mean if 0 < 7 < , where Amax is the 

'^max 

largest eigenvalue of the autocorrelation of the vector x, denoted as C [50]. When rj is too 
small, the possibility of getting stuck at a local minimum of the error function is 
increased. In contrast, the possibility of falling into oscillatory traps is high when rj is too 
large. By statistically preprocessing the input patterns, namely, decorrelating the input 
patterns, the excessively large eigenvalues of C can be avoided and thus, increasing rj can 
effectively speed up the convergence. PCA preconditioning speeds up the BP in most 
cases, except when the pattern set consists of sparse vectors. In practice, rj is usually 
chosen to be 0 < 7 < 1 so that successive weight changes do not overshoot the minimum 
of the error surface. The BP algorithm can be improved by adding a momentum term [29] 

^w(t)=-rj^+aAW{t-\) (3.22) 

where a is the momentum factor, usually 0 < a < 1. The typical value for a is 0.9. This 
method is usually called the BP with momentum (BPM) algorithm. 

The BP algorithm is a supervised gradient-descent technique, wherein the MSE between 
the actual output of the network and the desired output is minimized. It is prone to local 
minima in the cost function. The performance can be improved and the occurrence of 
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local minima reduced by allowing extra hidden units, lowering the gain term, and by 
training with different initial random weights. 

3.3.1 Incremental Learning versus Batch Learning 

Incremental learning and batch learning are two methods for the BP learning. For 
incremental learning, the training patterns are presented to the network sequentially. It is 
a stochastic optimization method. For each training example, the weights are updated by 
the gradient-descent method. The learning algorithm has been proved to minimize the 
global error E when //ino is sufficiently small [29]. 

In batch learning, the optimization objective is E, and the weight update is performed at 
the end of an epoch . It is a deterministic optimization method. The weight incremental 
for each example is accumulated over all the training examples before the weights are 
actually adapted. For sufficiently small learning rates, incremental learning approaches 
and batch learning produces the same results [50]. 

3.3.2 Selecting the Parameters 

The performances of the backpropagation algorithm is highly dependent upon a suitable 
selection for rj and a, which, unfortunately, are usually selected by trial-and-error. For 
different taslcs and at different stages of training, some heuristics are needed for optimally 
adjusting tj and a to speed up the convergence of the algorithms. According to Rumelhart 
[29], the process of starting with a large 7/ and gradually decreasing it is similar to that in 
simulated annealing [36]. The algorithm escapes from a shallow local minimum in early 
training and converges into a deeper, possibly global minimum. Learning parameters are 
typically adapted once each epoch. All the weights in the network are typically 
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updated using the global learning parameters rj and a. The optimal rj is the inverse of the 
largest eigenvalue, i. e. Imaxj of the Hessian matrix H of the error function, [37]. An online 
algorithm for estimating Amax has been proposed by LeCun [51] that does not even 
require the calculation of the Hessian. 

3J.i Backpropagation with Global Descent 

The gradient-descent method is a stochastic dynamical system whose stable points only 
locally minimize the energy (error) function. The global descent method, [52] which is 
based on a global optimization technique called terminal repeller unconstrained 
subenergy tunneling (TRUST) [53, 54], is a deterministic dynamic system consisting of a 
single vector differential equation. The global descent rule replaces the gradient-descent 
rule for MLP learning. TRUST was introduced for general optimization problems, and it 
formulates optimization in terms of the flow of a special deterministic dynamical system. 
3.3.4 Robust Backpropagation Algorithms 

Since the BP algorithm is a special case of stochastic approximation, the techniques of 
robust statistics [55] can be applied to the BP. In the presence of outliers-estimator-based 
robust learning. The rate of convergence is improved since the influence of the outliers is 
suppressed. Robust BP algorithms using the M-estimator-based criterion functions are a 
typical class of robust algorithms, such as the robust BP using Hampel’s tanh estimator 
with time- varying error cutoff points md ^ , and the annealing robust BP (ARBP) 
algorithm [56]. The ARBP algorithm adopts the annealing concept into robust learning. A 
deterministic annealing process is applied to the scale estimator. The algorithm is based 
on gradient-descent method. As P(t) -> oo, the ARBP becomes a BP algorithm. The basic 
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idea of using an annealing schedule is to use a larger scale estimator in the early training 
stage and then to use a smaller scale estimator in the later tr aining stage. 

3.4 The Fuzzy Logic 

The concept of fuzzy sets was first proposed by L.A. Zadeh [57] in the year 1965 as a 
method for mathematical modeling of the uncertainty in human reasoning. Rather than 
the binary logic, fuzzy logic uses the notion of membership. Fuzzy logic is most suitable 
for the representation of hazy data and concepts on an intuitive basis, such as human 
linguistic description. The conventional or crisp set can be treated as a special case of the 
concept of a fuzzy set. A fuzzy set is uniquely determined by its membership function, 
and it is also associated with a linguistically meaningful term. Fuzzy logic provides a 
systematic framework to incorporate human reasoning and experiences. Fuzzy logic is 
based on three core concepts, namely, fuzzy sets, linguistic variables, and possibility 
distributions. A fuzzy set is an effective means to represent linguistic variables. A 
linguistic variable is a variable whose value can be described qualitatively using a 
linguistic expression and quantitatively using a member function. Linguistic 
expressions [5 8] are useful for communicating concepts and knowledge with human 
beings, whereas membership functions are useful for processing ntimeric input data. 
When a fuzzy set is assigned to a linguistic variable, it imposes an elastic constraint, 
czIIqA. 2 i. possibility distribution, on the possible values of the variable. Fuzzy logic is a 
rigorous mathematical discipline and fuzzy reasoning is a precise formalism for encoding 
human knowledge in a mathematical framework. In fuzzy control, human knowledge is 
codified by means of linguistic IF-THEN rules, which build up a fuzzy inference 
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systeni(FIS).The fuzzy inference systems (FIS) can approximate arbitrarily, any 
continuous function on a compact domain quite accurately[59]. Applications of fuzzy 
logic are found in many contexts from medicine to finance, from human factors to 
consumer products, from vehicle control to computational linguistics etc. Fuzzy logic is 
widely used in the industrial practice of advanced information technology such as data 
analysis, regression, signal and image processing. Like the multilayer perceptron(MLP) 
and the radial basis function network(RBFN), some fuzzy inference systems (FISs) have 
universal function approximation capability. These systems can be used in many areas 
where neural networks are applicable. 

The most distinguishing property of fuzzy logic is that it deals with fuzzy propositions, 
that is, propositions which contain fuzzy variables and fuzzy values, for example, "the 
temperature is high," "the height is short." The truth values for fuzzy propositions are not 
TRUE/FALSE only, as is the case in propositional Boolean logic, but include all the 
grayness between two extreme values. Fuzzy rules deal with fuzzy values such as, "high," 
"cold," "very low," etc. Those fuzzy concepts are usually represented by their 
membership functions (MF). A membership function shows the extent to which a value 
from a domain (also called universe) is included in a fuzzy concept. Fuzzy inference 
methods based on fuzzy logic can be used successfully. Fuzzy inference takes inputs, 
applies fuzzy rules, and produces outputs. Inputs to a fuzzy system can be either exact, 
crisp values, or fuzzy values. Output values from a fuzzy system can be fuzzy, for 
example, a whole membership function for the inferred fuzzy value; or exact (crisp), a 
single value is produced on the output. The process of transforming an output 
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membership function into a single value is called defuzzification. The secret for the 
success of fuzzy systems is that they are easy to implement, easy to maintain, easy to 
understand, robust, and cheap. 

A fuzzy system is defined by three main components: 

1 . Fuzzy input and output variables, defined by their fiizzy values 

2. A set of fuzzy rules 

3. Fuzzy inference mechanism 

S.4.1 Fuzzy Inference Systems and Fuzzy Controllers 

Fuzzy logic-based controllers are popular control systems. The general architecture of a 
fuzzy controller is depicted in Fig. 3.4. Fuzzy controllers are knowledge based, where 
knowledge is defined by fiizzy IF-THEN mles. The core of a fuzzy controller is an FIS, 
in which the data flow involves fuzzification, knowledgebase evaluation, and 
defuzzification. In an FIS, the knowledge base is comprised of the fiizzy rule base and the 
database. The database contains the linguistic term sets considered in the linguistic rules 
and the MFs defining the semantics of the linguistic variables, and information about 
domains. The rule base contains a collection of linguistic rules that are joined by the 
ALSO operator. An expert provides his knowledge in the form of linguistic mles. The 
fuzzification process collects the inputs and then converts them into linguistic values or 
fiizzy sets. The decision logic, called inference engine, generates output from the 
input, and finally the defuzzification process produces a crisp output for control action. 
FISs are universal approximators capable of performing nonlinear mappings between 
inputs and outputs . The interpretations of a certain rule and the rule base depend on the 
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FIS model. The Mamdani [60] and the TSK [61] models are two popular FISs. The 
Mamdani model is a nonadditive fuzzy model that aggregates the output of fuzzy rules 
using the maximum operator, while the TSK model is an additive fuzzy model that 
aggregates the output of rules using the addition operator. Kosko’s standa additive model 
(SAM) [62] is another additive fuzzy model. All these models can be derived from fuzzy 
graph [63], and are universal approximators [59,64,65,66]. Both neural networks and 
fuzzy logic can be used to approximate an unknown control function. Neural networks 
achieve a solution using the learning process, while FISs apply a vague interpolation 
technique. FISs are appropriate for modeling nonlinear systems whose mathematical 
models are not available. Unlike neural networks and other numerical models, fuzzy 
models operate at a level of information granules — fuzzy sets. 


FIS 



Figure 3.4: A model of Fuzzy controller. 
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14.2 Fuzzy Rules and Fuzzy Inference 

There are two types of fuzzy rules, namely, fuzzy mapping rules and fuzzy implication 
“ules [63]. A fuzzy mapping rule describes a functional mapping relationship between 
inputs and an output using linguistic terms, while a fuzzy implication mle describes a 
generalized logic implication relationship between two logic formulas involving 
linguistic variables. Fuzzy implication rales generalize set-to-set implications, whereas 
fuzzy mapping rules generalize set-to-set associations. The former was motivated to 
allow intelligent systems to draw plausible conclusions in a way similar to human 
reasoning, while the latter was motivated to approximate complex relationships such as 
nonlinear functions in a cost-effective and easily comprehensible way. The foundation of 
fuzzy mapping rale is fuzzy graph, while the foundation of fuzzy implication rale is a 
generalization to two-valued logic. A rale base consists of a munber of rules in the IF- 
THEN logic IF condition, THEN action. The condition, also called premise, is made up 
of a number of antecedents that are negated or combined by different operators such as 
AND or OR computed with f-norms or f-conorms. In a fuzzy-rale system, some variables 
are linguistic variables and the determination of the MF for each fuzzy subset is critical. 
MFs can be selected according to human intuition, or by learning ftom training data. 
Fuzzy logic can be used as the basis for inference systems. A fuzzy inference is made up 
of several rules with the same output variables. Given a set of fuzzy rules, the inference 
result is a combination of the fuzzy values of the conditions and the corresponding 
actions. For example, we have a set of rules: 
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Ri : IF(condition = C ^ ) THEN {action = A,- ) (3 .23) 

)r i = 1,2,3, where C, is a fuzzy set. Assuming that a condition has a 

lembership degree of //, associated with the setc, . The condition is first converted into 
fuzzy category using a syntactical representation 


condition = — + + — + 
Ni A3 



(3.24) 


t fuzzy inference is the combination of all the possible consequences. The action coming 
irom a fuzzy inference is also a fuzzy category : 


action = 


Ai 

N\ 





(3.25) 


This is also a syntactical representation. The inference procedure depends on fuzzy 
reasoning. This result can be further processed or transformed into a crisp value. 

^.4.3 Fuzzification and Defuzzification 

Fuzzification is to transform crisp inputs into fuzzy subsets. Given crisp inputs 

Xi, where i = 1, .....n, fuzzification is to construct the same number of fuzzy 

sets A' 

A' - fuzz{xi\ (3.26) 

where fuzz( ) is a fuzzification operator. Fuzzification is determined according to the 
defined MFs. 

Defuzzification is to map fuzzy subsets of real numbers into real numbers. In an FIS, 
defuzzification is applied after aggregation. Defuzzification is necessary in fuzzy 
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antrollers, since the machines cannot understand control signals in the form of a 
omplete fuzzy set. Popular defuzzification methods include the centroid defuzzifier [54], 
nd the mean-of-maxima defuzzifier [60]. The centroid defuzzifier is the bestknown 
lethod, which is to find the centroid of the area surrounded by the MF and the horizontal 
xis. A discrete centroid defuzzifier is given by [67] 

defuzz{B) = -^ ( 3 . 26 ) 

i=\ 

vhere K is the number of quantization steps by which the universe of discourse Y of the 
VIF juB(y) is discretized. Aggregation and defuzzification can be combined into a single 
phase, such as the weighted-mean method [68] 

defuzz{B)--^ (3.27) 

Eft 

M 

where Nr is the number of rules, //,■ is the degree of activation of the i rule, and bi is a 
numerical value associated vdth the consequent of the z'* rule, 5;.The parameter bi can be 
selected as the mean value of the a-level set when a is equal to [68]. 

3.4.4 Complex Fuzzy Logic 

Complex fuzzy sets and logic are mathematical extensions of fuzzy sets and logic from 
the real domain to the complex domain [69,70]. A complex fuzzy set S is characterized 
by a complex-valued MF, and the membership degree of any element x in 5 is given by a 
complex value of the form 
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(3,28) 

where the amplitude (x)e [o,l], and <35 is the phase, that is, //^(x) is within a unit circle 
in the complex plane. As in the development of the fuzzy logic theory, the design of the 
settheoretic operations is of vital importance in the complex fuzzy logic. The basic set 
operators for fuzzy logic have been extended for the complex fuzzy logic, and some 
additional operators such as the vector aggregation, set rotation and set reflection, are 
also defined [69,70]. The operations of intersection, union and complement for complex 
fuzzy sets are defined on the modulus of the complex membership degree without a 
consideration of its phase information. The complex fuzzy logic is extended to logic of 
vectors in the plane, rather than scalar quantities. A complex fuzzy set is defined as an 
MF mapping the complex plane into [0, 1] x [0, 1 ]. Complex fuzzy sets are superior to 
the Cartesian products of two fuzzy sets. Complex fuzzy logic maintains both the 
advantages of the fuzzy logic and the properties of complex fuzzy sets. In complex fuzzy 
logic, rules constructed are strongly related and a relation manifested in the phase term is 
associated with complex fuzzy implications. In a complex FIS, the output of each rule is 
a complex fuzzy set, and phase terms are necessary when combining multiple rules so as 
to generate the final output. Complex FISs are useful for solving problems in which rules 
are related to one another with the nature of the relation varying as a function of the input 
to the system [ 68 ]. These problems may be very difficult or impossible to solve using 
traditional fuzzy methods. The fuzzy complex number [69] is a different concept from the 
complex fuzzy set [70]. The fuzzy complex number was introduced by incorporating the 
complex number into the support of the fuzzy set. A fuzzy complex number is a fuzzy set 
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of complex numbers, which have real-valued membership degree in the range [0, 1]. An 
a-cut of a fuzzy complex number is based on the modulus of the complex numbers in the 
fuzzy set. The operations of addition, subtraction, multiplication and division for fuzzy 
complex numbers are derived using the extension principle, and closure of the set of 
fuzzy complex numbers is proved under each of these operators. In a nutshell, a fuzzy 
complex number is a fuzzy set in one dimension, while a complex fuzzy set or number is 
a fuzzy set in two dimensions. 

3.5 The Evolutionary Algorithm 

Evolutionary algorithms are a class of general-purpose stochastic optimization algorithms 
under the universally accepted neo-Darwinian paradigm. The neo- Darwinian paradigm is 
a combination of the classical Darwinian evolutionary theory, the selectionism of 
Weismann, and the genetics of Mendel [71]. EAs are currently a major approach to 
adaptation and optimization. Evolutionary algorithms are also known as the optimum 
search methods that take their inspiration from natural selection and survival of the fittest 
in the biological world. EAs differ from more traditional optimization techniques in that 
they involve a search from a "population" of solutions, not from a single point. Each 
iteration of an EA involves a competitive selection that weeds out poor solutions. The 
solutions with high "fitness" are "recombined" with other solutions by swapping parts of 
a solution with another. Solutions are also "mutated" by making a small change to a 
single element of the solution. Recombination and mutation are used to generate new 
solutions that are biased towards regions of the space for which good solutions have 
already been seen. An extended discussion of issues involved with the implementation 
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and use of evolutionary algorithms is included here. Several different types of 
evolutionary algorithms were developed independently. These include genetic 
programming (GP), which evolve programs, evolutionary programming (EP), which 
focuses on optimizing continuous functions without recombination, evolutionary 
strategies (ES), which focuses on optimizing continuous functions with recombination, 
and genetic algorithms (GAs) [72], which focuses on optimizing general combinatorial 
problems. 

For an optimization problem in a domain, if the calculus is difficult to implement or is 
inapplicable, search methods such as EAs can be used. EAs are a class of stochastic 
search and optimization techniques guided obtained by natural selection and genetics. 
They are population-based algorithms by simulating the natural evolution of biological 
systems. Individuals in a population compete and exchange information with one another. 
There are three basic genetic operations, namely, crossover, mutation, and selection. EAs 
are stochastic processes performing searches over a complex and multimode space. They 
have the several advantages such as; 

• The EA approach is a general-purpose one that can be directly interfaced to 
existing simulations and models. 

• They are suitable for evaluation functions that are large, complex, non- 
continuous, non-differentiable, and multimodal. 

• They are extendable and easy to hybridize so that they can reach the near 
optimum or the global maximum. 
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• Evolutionary algorithms possess inherent parallelism by evaluating multipoint 
simultaneously also employ a structured, yet randomized, parallel multipoint 
search strategy that is biased toward reinforcing search points of high fitness. 

3.5.1 The Terminologies Used 

To define the evolutionary algorithm, the following terminologies analogous to biological 
counterparts are used. 

Population 

A set of individuals in a generation is called a population, P(r)= |xj,x2, | , where ;c, 

is the i* individual, and Np is the size of the population. The initial populations are 
usually generated randomly, while the population of other generations is generated from 
some selection and/or reproduction procedure. 

Chromosome 

Each individual x, in a population is a single chromosome. A chromosome, sometimes 
called a genome, is a set of parameters that define a solution to the problem under 
consideration. The chromosome is often represented as a string in EAs. Biologically, a 
chromosome is a long, continuous piece of DNA that contains many genes, regulatory 
elements and other intervening nucleotide sequences. 

Gene 

In evolutionary algori thm s, each chromosome X comprises of a string of elements x, , 

called getter, f.e.A = [x„x2, x„], where n is the number of genes in the 

chromosome. Each gene encodes a parameter of the problem into the chromosome. A 
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gene is usually encoded as a binary string or a real number. In biology, genes are entities 
that parents pass to offspring during reproduction. 

Allele 

The biological definition for an allele is any one of a number of alternative forms of the 
same gene occupying a given position called a locus on a chromosome. In the EA 
terminology, the value of a gene is indicated as an allele. 

Genotype 

A genotype is biologically referred to the underlying genetic coding of a living organism, 
usually in the form of DNA. The genotype of each organism corresponds to an 
observable, known as a phenotype. In evolutionary algorithms, a genotype represents a 
coded solution, that is, an individual’s chromosome. 

Phenotype 

Biologically, the phenotype of an organism is either its total physical appearance and 
constitution or a specific manifestation of a trait. A phenotype is determined by genotype 
or multiple genes and influenced by environmental factors. The concept of phenot3^ic 
plasticity describes the degree to which an organism’s phenotype is determined by its 
genotype. A high level of plasticity means that environmental factors have a strong 
influence on the particular phenotype that develops. The ability to learn is the most 
obvious example of phenotypic plasticity. In EAs, a phenotype represents a decoded 
solution. 

Fitness 
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Fitness in biology refers to the ability of an individual of certain genotype to reproduce. 
The set of all possible genotypes and their respective fitness values is called e. fitness 
landscape. Fitness function is a particular type of objective function that quantifies the 
optimality of a solution, i.e. a chromosome, in an EA. It is used to map an individual’s 
chromosome into a positive number. Fitness is the value of the objective function for a 
chromosome X, , namely /(jc, ) . After the genotype is decoded, the fitness function is used 
to convert the phenotype’s parameter values into the fitness. Fitness is used to rate the 
solutions. 

Natural Selection 

Natural selection is believed to be the most important mechanism in the evolution of 
biological species. It alters biological populations over time by propagating heritable 
traits affecting individual organisms to survive and reproduce. It is concerned with those 
traits that help individuals to survive the environment and to reproduce. Natural selection 
is different from artificial selection. Genetic drift and gene flow are two other 
mechanisms in biological evolution. Genetic flow, also known as genetic migration, is 
the migration of genes from one population to another. 

Genetic Drift 

Genetic drift is a contributing mechanism in biological evolution. As opposed to natural 
selection, genetic drift is a stochastic process that arises from random sampling in the 
reproduction. It changes allele frequencies (gene variations) in a population over many 
generations and affects traits that are more neutral. The genes of a new generation are a 
sampling from the genes of the successful individuals of the previous one, but with some 
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statistical error. Drift is the cumulative effect over time of this sampling error on the 
allele frequencies in the population, and traits that do not affect reproductive fitness 
change in a population over time. Like selection, genetic drift acts on populations, 
altering allele frequencies and the predominance of traits. It occurs most rapidly in sm all 
populations and can lead some alleles to become extinct or become the only alleles in the 
population, thus reducing the genetic diversity in the population. 

Termination Criterion 

The search process of an EA will terminate when a certain termination criterion is met. 
Otherwise a new generation will be produced and the search process continues. The 
termination criterion can be selected as a maximum number of generations, or the 
convergence of the genotypes of the individuals. Convergence of the genotypes occurs 
when all the bits or values in the same positions of all the strings are identical, and 
crossover has no effect for further processes. Phenotypic convergence without genotypic 
convergence is also possible. For a given system, the objective values are required to be 
mapped into fitness values so that the domain of the fitness function is always greater 
than zero. 

3.5.2 The Genetic Algorithm 

Genetic Algorithms are unorthodox search or optimization algorithms, which were first 
suggested by John Holland in 1975. The GA [73] is the most popular form of EAs. 
Concisely stated, a genetic algorithm is a programming technique that mimics biological 
evolution as a problem-solving strategy. Given a specific problem to solve, the input to 
the GA is a set of potential solutions to that problem, encoded in some fashion, and a 
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metric called o. fitness function that allows each candidate to be quantitatively evaluated. 
These candidates may be solutions already known to work, with the aim of the GA being 
to improve them, but more often they are generated at random. The GA then evaluates 
each candidate according to the fitness function. In a pool of randomly generated 
candidates, of course, most will not work at all, and these will be deleted. However, 
purely by chance, a few may hold promise - they may show activity, even if only weak 
and imperfect activity, toward solving the problem. These promising candidates are kept 
and allowed to reproduce. Multiple populations are made of them, but the populations are 
not perfect; random changes are introduced during the process of population generation. 
These digital offspring then go on to the next generation, forming a new pool of 
candidate solutions, and are subjected to a second roimd of fitness evaluation. Those 
candidate solutions which were worsened, or made no better, by the changes to their code 
are again deleted; but again, purely by chance, the random variations introduced into the 
population may have improved some individuals, making them into better, more 
complete or more efficient solutions to the problem at hand. Again these winning 
individuals are selected and copied over into the next generation with random changes, 
and the process repeats. The expectation is that the average fitness of the population will 
increase each round, and so by repeating this process for hundreds or thousands of 
rounds, very good solutions to the problem can be discovered. 

As astonishing and counterintuitive as it may seem to some, genetic algorithms have 
proven to be an enormously powerful and successful problem-solving strategy, 
dramatically demonstrating the power of evolutionary principles. Genetic algorithms 
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have been used in a wide variety of fields to evolve solutions to problems as difficult as 
or more difficult than those faced by human designers. Moreover, the solutions they 
come up with are often more efficient, more elegant, or more complex than anything 
comparable a human engineer would produce. In some cases, genetic algorithms have 
come up with solutions that baffle the programmers who wrote the algorithms in the first 
place. 

3.S.2.1 Representation or Encoding/Decoding 

The GA uses binary coding. A chromosome x is a potential solution, denoted by a 

concatenation of the parameters X = [x^,x 2 x„] where each x, is a gene, and the value 

of Xj is an allele X is encoded as binary strings; sequences of I’s and O's, where the digit 
at each position represents the value of some aspect of the solution. Another, similar 
approach is to encode solutions as arrays of integers or decimal numbers, with each 
position again representing some particular aspect of the solution. This approach allows 
for greater precision and complexity than the comparatively restricted method of using 
binary numbers only and often "is intuitively closer to the problem space" 

A third approach is to represent individuals in a GA as strings of letters, where each letter 
again stands for a specific aspect of the solution. One example of this technique is 
Hiroaki Kitano's "grammatical encoding" approach, where a GA was put to the task of 
evolving a simple set of rules called a context-free grammar that was in turn used to 
generate neural networks for a variety of problems [74]. 
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Another strategy, developed principally by John Koza [75]of Stanford University and 
called genetic programming, represents programs as branching data structures called 
trees. In this approach, random changes can be brought about by changing the operator or 
altering the value at a given node in the tree, or replacing one sub- tree with another. 

3.5.2.2 Selection or Reproduction 

Selection embodies the principle of survival of the fittest, which provides a driving force 
in the GA. Selection is based on the fitness of the individuals. There are many different 
techniques which a genetic algorithm can use to select the individuals to be copied over 
into the next generation, but listed below are some of the most common methods. Some 
of these methods are mutually exclusive, but others can be and often are used in 
combination. 

Elitist selection: The fit members of each generation are guaranteed to be selected. It is 
the most commonly used technique, is also known as elitism. The elitism strategy for 
selecting the individual with best fitness can improve the convergence of the GA [74]. 
The elitism strategy always copies the best individual of a generation to the next 
generation. Although elitism may increase the possibility of premature convergence, it 
improves the performance of the GA in most cases and thus, is integrated in most GA 
implementations [76]. 

Fitness-proportionate selection'. yioxQ fiX individuals are more likely, but not certain, to 
be selected. 
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Roulette-wheel selection: A form of fitness-proportionate selection in which the chance 
of an individual s being selected is proportional to the amount by which its fitness is 
greater or less than its competitors' fitness. 

Scaling selection: As the average fitness of the population increases, the strength of the 
selective pressure also increases and the fitness function becomes more discriminating. 

Tournament selection: Subgroups of individuals are chosen from the larger population, 
and members of each subgroup compete against each other. Only one individual from 
each subgroup is chosen to reproduce. 

Rank selection: Each individual in the population is assigned a numerical rank based on 
fitness, and selection is based on this ranking rather than absolute differences in fitness. 
Generational selection: The offspring of the individuals selected from each generation 
become the entire next generation. No individuals are retained between generations. 

Steady-state selection: The offspring of the individuals selected from each generation go 
back into the pre-existing gene pool, replacing some of the less fit members of the 
previous generation. Some individuals are retained between generations. 

Hierarchical selection: Individuals go through multiple rounds of selection each 
generation. Lower-level evaluations are faster and less discriminating, while those that 
survive to higher levels are evaluated more rigorously. 
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3.5.2.3 Replacement or Change 

During the selection procedure, we need to decide as to how many individuals in one 
popu21ation will be replaced by the newly generated individuals so as to produce the 
population for the new generation. Thus, the selection mechanism is split into two phases, 
namely, parental selection and replacement strategy. There are many replacement 
strategies such as the complete generational replacement [77], replace-random, replace- 
worst, replaceoldest, and deletion by kill tournament [78]. In the crowding strategy , an 
offspring replaces one of the parents whom it most resembles using the similarity 
measure of the Hamming distance. These replacement strategies may result in a situation 
where the best individuals in a generation may fail to reproduce. In this problem is solved 
by introducing into the system a new variable that stores the best individuals obtained so 
far. Elitism strategy cures the same problem without changing the system state . There are 
two basic strategies to accomplish this. The first and simplest is called mutation and the 
other one is crossover. 

Both crossover and mutation are considered the driving forces of evolution. Crossover 
occurs when two parent chromosomes, normally two homologous instances of the same 
chromosome, break and then reconnect but to the different end pieces. Mutations can be 
caused by copying errors in the genetic material during cell division and by external 
environment factors. Although the overwhelming majority of mutations have no real 
effect, some can cause disease in orgamsms due to partially or fully nonfunctional 
proteins arising from the errors in the protein sequence. 

121 


Chapter 3: Soft Computing: Techniques, Models & Applications 


Crossover is the primary exploration operator in the GA, which searches the range of 
possible solutions based on existing solutions. Crossover, as a binary operator, is to 
exchange information between two selected parent chromosomes at randomly selected 
positions and to produce two new offspring (individuals). Both the children will be 
different from either of their parents, yet retain some features of both. The method of 
crossover is highly dependent on the method of the genetic coding. Some of the 
commonly used crossover [79] techniques are the one-point crossover, the two-point 
crossover, the multipoint crossover, and the uniform crossover. The crossover points are 
typically at the same, random positions for both parent chromosomes. These crossover 
operators are illustrated in Figure 3.5. 


Parents 

A B C 

1 D 

E 

F 

G 

H 

I 

j 




a 

b 

1 

c 

d 

e 

f 

g 

h 

i 

j 



Children 

A 

b 

c 

D 

e 

F 


h 

I 

j 




a 

B 

c 

1 d ! 

E 

f 

G 

H 

i 

1 J 1 


Figure 3.5: Generalized crossover. 

The one-point crossover requires one crossover point on the parent chromosomes, and all 
the data beyond that point are swapped between the two parent chromosomes. The one- 
point crossover is easy to model analytically, and it generates bias toward bits at the ends 
of the strings, everything between the two points is swapped. The two-point crossover 
causes a smaller schema disruption than the one-point crossover. The two-point crossover 
eliminates this disadvantage of the one-point crossover, but generates bias at a different 
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Iqv&X. Multipoint crossover treats each string as a ring of bits divided by m crossover 
points into m segments, and each segment is exchanged at a fixed probability. Uniform 
Crossover exchanges bits of a string rather than segments. Individual bits in the parent 
chromosomes are compared, and each of the non-matching bits is probabilistically 
swapped with a fixed probability, typically 0.5. The uniform crossover is unbiased with 
respect to defining length. In the half-uniform crossover, exactly half of the nonmatching 
bits are swapped. 

Mutation 

Mutation is a unary operator that requires only one parent to generate an offspring. A 
mutation operator typically selects a random position of a random chromosome and 
replaces the corresponding gene or bit by other information. Mutation helps to regain the 
lost alleles into the population. Mutations can be classified into point mutations and 
large-scale mutations. Point mutations are changes to a single position, which can be 
substitutions, deletions, or insertions of a gene or a bit. Large-scale mutations can be 
similar to the point mutations, but operate at multiple positions simultaneously, or at one 
point vrith multiple genes or bits, or even on the chromosome scale. Functionally, 
mutations introduce the necessary amount of noise to do hill climbing. Two additional 
large-scale mutation operators are the inversion and rearrangement operators. The 
inversion operator [73] picks up a portion between two randomly selected positions 
within a chromosome and then reverses it. Inversion reshuffles the order of the genes in 
order to achieve a better evolutionary potential. However, its computational advantage 
over other conventional genetic operators is not clear. The swap operator is the most 
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primitive reordering operator; based on which many new unary operators including 
inversion can be derived. The rearrangement operator reshuffles a portion of a 
chromosome such that the juxtaposition of the genes or bits is changed. Some mutation 
operations are illustrated in Figure 3.6. 
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Parents ABJDEF GH I J 

Children i p — i ^ 1 r r— — ^ 1 1 

A B D E F G H I J 

b. Mutation by deletion. 
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c. Mutation by duplication. 

Parents I A I B I c l i I I I I I I 


Children A B C H B i I 
d. Mutation by inversion. 


I J 
1 J 


Figure 3.6: Different types of mutation. The alphabets show the bits or gene. 

A high mutation rate can lead genetic search to random search. A high mutation rate may 
change the value of an important bit, and thus slow down the fast convergence of a good 
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solution or slow down the process of convergence of the final stage of the iterations. 

Thus, mutation is made occasionally in the GA. In the simple GA, the mutation is 
typically selected as a substitution operation that changes one random bit in the 
chromosome at a time. An empirically derived formula that can be used as the probability 
of mutation at a starting point is given by: 

(3.29) 

where T is the total number of generations and / is the string length. The random nature of 
mutation and its low probability of occurrence make the convergence of the GA slow. 

The search process can be expedited by using the directed mutation technique [80] that 
deterministically introduces new points into the population by using gradient or 
extrapolation of the information acquired so far. In passing, it may be mentioned that the 
relative importance of crossover and mutation has been discussed in the GA community 
and no compelling conclusion has been drawn. It is commonly agreed that crossover 
plays a more important role if the population size is large, and mutation is more important 
if the population size is small. 

3.6 Immune Algorithms 

t'.. ■'■■■''■ 

In 1974, Jeme [81] proposed a network theory for the immune system based on the clonal 

' 

selection theory [82]. The biological immune system has the features of immunological | 

memory and immunological tolerance. It has self-protection and self-regulation 
mechanisms. The basic components of the immune system are two tjqies of lymphocytes, 

namely B-lymphocytes and Tlymphocytes, which are cells produced by bone marrow and 
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by the thymus, respectively. B-lymphocytes generate antibodies on their surfaces to resist 
their specific antigens, while T-lymphocytes regulate the production of antibodies from 
B-lymphocytes. Roughly 1 07 distinct types of B-lymphocytes exist in a human body. An 
antibody recognizes and eliminates a specific type of antigen. The clonal selection theory 
describes the basic features of an immune response to an antigenic stimulus. The clonal 
operation is an antibody artificial immune networks [83] employ two types of dynamics. 
The short-term dynamics govern the increase or decrease of the concentration of a fixed 
set of lymphocyte clones and the corresponding immunologist. The metadynamics 
govern the recruitment of new species from an enormous pool of lymphocytes freshly 
produced by the bone marrow. The short-term dynamics correspond to a set of 
cooperating or competing agents, while the metadynamics refine the results of the short- 
term dynamics. As a result, the short-term dynamics are closely related to neural 
networks and the metadynamics are similar to the GA. The immune algorithm, also 
called the clonal selection algorithm, introduces suppress cells to change search scope 
and memory cells to keep the candidate solutions. It is an EA inspired by the immime 
system, and is very similar to the GA. In the immune algorithm, antigen is defined as the 
problem to be optimized, and antibody is the solution to the objective function. Only 
those lymphocytes that recognize the antigens are selected to proliferate. The selected 
lymphocytes are subject to an affinity maturation process, which improves their affinity 
to the selective antigens. Learning in the immune system involves raising the relative 
population size and affinity of those lymphocytes. The immune algorithm first recognizes 
the antigen, and produces antibodies from memory cells. Then it calculates the affinity 

126 


Chapter 3: Soft Computing: Techniques, Models & Applications 


between antibodies, which can be treated as fitness. Antibodies are dispersed to the 
memory cell, and the concentration of antibodies is controlled by stimulating or 
suppressing antibodies. A diversity of antibodies for capturing unknown antigen is 
generated using genetic reproduction operators. The immune mechanism can also be 
defined as a genetic operator and integrated into the GA [84]. The immune operator 
overcomes the blindness in action of the crossover and mutation and to make the fitness 
of population increase steadily. The immune operator is composed of two operations, 
namely a vaccination and an immune selection, and it utilizes reasonably selected 
vaccines to intervene in the variation of genes in an individual chromosome. 

3.7 Ant-colony Optimization 

The AGO [85-88] is a metaheuristic approach for solving discrete or continuous 
optimization problems such as COPs. The AGO heuristic was inspired by the foraging 
behavior of ants. Ants are capable of finding the shortest path between the food and the 
colony (nest) due to a simple pheromone-laying mechanism. The optimization is the 
result of the collective work of all the ants in the colony. Ants use their pheromone trails 
as a medium for co mmuni cating information. All the ants contribute to the pheromone 
reinforcement. Old trails will vanish due to evaporation. Different AGO algorithms arise 
fi'om different pheromone value update rules. The ant system [85] is an evolutionary 
approach, where several generations of artificial ants search for good solutions. Every ant 
of a generation builds up a complete solution, step by step, going through several 
decisions by choosing the nodes on a graph according to a probabilistic state transition 
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rw/e, called the random-proportional rule. The probability for ant k at node I moving to 
nodey at generation t is defined by 


hj 




Z 


t- d~^ 

uejf 


(3.30) 


for j e J- , where xij is the intensity of the pheromone on edge i j, dij is the distance 
between nodes i and J, Jf is the set of nodes that remain to be visited by ant k positioned 
at node i to make the solution feasible, and > 0 is a parameter that determines the 
relative importance of pheromone vs. distance. A tabu list is used to save the nodes 
already visited during each generation. When a tour is completed, the tabu list is used to 
compute the ant’s current solution. Once all the ants have built their tours, the pheromone 
is updated on all edges i — > J according to a global-pheromone updating rule 




(3.31) 


A:=l 


where rf is the intensity of the pheromone on edge i — > j contributed by ant k, taking — 

if i j is an edge used by ant k, and 0 otherwise, a e (o,l) is a pheromone decay 
parameter, Lk is the length of the tour performed by ant k, and Np is the number of ants. 
Thus, a shorter tour gets a higher reinforcement. Each edge has an LTM to store the 
pheromone.The ant-colony system (ACS) [85] improves the ant system by applying a 
local pheromone updating rule during the construction of a solution. The global updating 
rule is applied only to edges that belong to the best ant tour. Some important subsets of 
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ACO algorithms [8 6], such as the most successful ACS and min-max ant system (MMAS) 
algorithms, have been proved to converge to the global optimum [87], 

3. 8 Hybrid Approaches 

As far as the computation speed is concerned, it is hard to say whether EAs can compete 
with the gradient-descent method or not. There is no clear winner in terms of the best 
training algorithm, since the best method is always problem dependent. For large 
networks, EAs may be inefficient. When gradient information is readily available, it can 
be used to speed up the evolutionary search. The process of learning facilitates the 
process of evolution. The hybrid of evolution and gradient search is an effective 
alternative to the gradient-descent method in learning tasks, when the global optimum is 
at a premium. The hybrid method is more efficient than either an EA or the gradient- 
descent method used alone. As is well known, EAs are inefficient in fine tuning local 
search although they are good at global search. This is especially true for the GA. By 
incorporating a local-search procedure such as the gradient descent into the evolution, the 
efficiency of evolutionary training can be improved significantly. Neural networks can be 
trained by alternating two steps, where an EA step is first used to locate a near-optimal 
region in the weight space and a local-search solution in that region [89]. Hybridization 
of EAs and local search can be based either on the Lamarckian strategy or on the Baldwin 
effect. Local search corresponds to the phenotypic plasticity in biological evolution. 
Since Hinton and Nowlan constructed the first computational model of the Baldwin effect 
in 1987 [90]. The hybrid methods based on the Lamarckian strategy and the Baldwin 
effect are very successful with numerous implementations. Although the Lamarckian 
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evolution is biologically implausible, it has proved effective within computer 
applications. Nevertheless, the Lamarckian strategy has been pointed- out to distort the 
population so that the schema theorem no longer applies [91]. The Baldwin effect only 
alters the fitness landscape and the basic evolutionary mechanism remains purely 
Darwinian. Thus, the schema theorem still applies to the Baldwin effect [91]. 

3.8.1 Fu^y Systems with Evolutionary Algorithms 

FISs are highly nonlinear systems with many input and output variables. A crucial issue 
in FISs is the generation of fuzzy rules. EAs can be employed with Evolutionary 
Algorithms and Evolving Neural Networks for generating fuzzy rules and adjusting MFs 
of fuzzy sets. Sufficient system information must be encoded and the representation must 
be easy for evaluation and reproduction. A fuzzy rule base can be evolved by encoding 
the number of rules and the MFs comprising those rules into one chromosome. All the 
input and output variables and their corresponding MFs in the fiizzy rules are encoded. 
The genetic coding of each rule is the concatenation of the shape and location parameters 
of the MFs of all the variables. If some rules in two individuals are ordered in different 
manners, the rules should be aligned before reproduction. This leads to much less time 
for evolution [92]. In addition to existing genetic operators, specific genetic operators 
such as rule insertion or rule deletion, where a whole rule is added or deleted at a 
specified point of the string, are also applied [92] . In automatic optimal design of fuzzy 
systems is conducted by using the GESA [93]. The ES approach is more suitable for the 
design of fuzzy systems due to their direct coding for real parameter optimization. 
Consequently, the length of the objective vector increases linearly with the number of 

• ^ - 130 


Chapter 3: Soft Computing: Techniques, Models & Applications 


variables. Conversely, when the GA is employed, all the parameters need to be converted 
into fixed-length binary strings. 

3.8.2 Neural Networks with Evolutionary Algorithms 

Neural-network learning is a search process for the minimization of a criterion or error 
function. In order to make use of existing learning algorithms, one needs to select a lot of 
parameters, such as the number of hidden layers, the number of units in each hidden 
layer, the type of learning rule, the transfer function, as well as learning parameters. In 
general, one usually selects an effective architecture by hand, and thus the procedure is 
time consuming. Moreover, gradient-based algorithms usually run multiple times to avoid 
local minima and also gradient information must be available. There are also adaptive 
methods for automatically constructing neural networks such as the upstart algorithm 
[94] and the growing neural tree method [95]. Evolution can be introduced into neural 
networks at many different levels. The evolution of connection weights provides a global 
approach to connection weight training. When EAs are used to construct neural networks, 
a drastic reduction in development time and simpler designs can be achieved. The 
optimization capability of EAs can lead to a minimal configuration that reduces the total 
training time as well as the performing time for new patterns. In most cases, an individual 
in an EA is selected as a whole network. Competition occurs among those individual 
networks, based on the performance of each network. Within this framework, an EA can 
be used to evolve the structure, the parameters, and/or the nonlinear activation function of 
a neural network. A recent survey on evolving neural networks is given in [96]. There are 
also occasions when EAs are used to evolve one hidden unit of the network at a time. The 
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competing units are individual units. During each successive run, candidate hidden units 
compete so that the optimal single unit is added in that run. The search space of EAs at 
each run is much smaller. However, the entire set of hidden units may not be the global 
optimal placement. 

Hybrid Training 

As far as the computation speed is concerned, it is hard to say whether EAs can compete 
with the gradient-descent method or not. There is no clear winner in terms of the best 
training algorithm, since the best method is always problem dependent. For large 
networks, EAs may be inefficient. When gradient information is readily available, it can 
be used to speed up the evolutionary search. The process of learning facilitates the 
process of evolution. The hybrid of evolution and gradient search is an effective 
alternative to the gradient-descent method in learning tasks, when the global optimum is 
at a premium. The hybrid method is more efficient than either an EA or the gradient- 
descent method used alone. As is well known, EAs are inefficient in fine tuning local 
search although they are good at global search. This is especially true for the GA. By 
incorporating a local-search procedure such as the gradient descent into the evolution, the 
efficiency of evolutionary training can be improved significantly. Neural networks can be 
trained by alternating two steps, where an EA step is first used to locate a near-optimal 
region in the weight space and a local-search step such as the ^adient-descent step is 
then used to find a local-optimal solution in that region [88,97]. Hybridization of EAs and 
local search can be based either on the Lamarckian strategy or on the Baldwin effect. The 
hybrid methods based on the Lamarckian strategy and the Baldwin effect are very 
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successful with numerous implementations. The Baldwin effect only alters the fitness 
landscape and the basic evolutionary mechanism remains purely Darwinian. Thus, the 
schema theorem still applies to the Baldwin effect [91]. 

Evolving Network Parameters 

EAs are robust search and optimization techniques, and can locate the near global 
optimum in a multimodal landscape. They can be used to optimize neural-network 
stmcture and parameters, or to optimize specific network performance and algorithmic 
parameters. EAs are suitable for learning networks with non differentiable activation 
function. Considerable research has been conducted on the evolution of connection 
weights, and the references therein. EAs evolve network parameters such as weights 
based on a fitness measure for the whole network. The fitness function can usually be 


defined as where E is the error or criterion function for network training. The 

\ + E 


complete set of network parameters is coded as a chromosome s with a fitness function 




1 

1 + £(d(.)) 


, where D(s) is a decoding transformation. 


Coding of network parameters is most important from the point of view of the 
convergence speed of search. When using the binary GA, the fixed-point coding is shown 
to be usually superior to the floating-point coding of the parameters [97]. For crossover, it 


is usually better to only exchange the parameters between two chromosomes, but not to 


change the bits of each parameter [97]. The modifications of network parameters can be 
conducted by mutation. Due to the limitation of the binary coding, real numbers are 
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usually used to represent network parameters directly [98]. Each individual is a real 
vector, and crossover and mutation are specially defined for real-coded EAs. 

The architecture of a neural network is referred to as its topological structure, i.e. 
connectivity. The network architecture is usually predefined and fixed. Design of the 
optimal architecture can be treated as a search problem in the architecture space, where 
each point represents architecture. Given certain performance criteria, such as minimal 
training error and lowest network complexity, the performance levels of all architectures 
form a discrete surface in the space. The performance surface is no differentiable due to a 
discrete number of nodes, and multimodal since different architectures have similar 
performance. Direct and indirect encodings are two methods for encoding architecture. 
For the direct encoding, every connection of the architecture is encoded into the 
chromosome. For the indirect encoding, only the most important parameters of an 
architecture, such as the number of hidden layers and the number of hidden units in each 
hidden layer, are encoded. Only the architecture of a network is evolved, whereas other 
parameters of the architecture such as the connection weights have to be learned after a 
near-optimal architecture is foxmd. 

One major problem with the evolution of architectures without connection weights is 
noisy fitness evaluation [96]. The noise is dependent on the random initialization of the 
weights and the training algorithm used. The noise identified is caused by the one-to- 
many mapping firom genotypes to phenotypes. Thus, the evolution of architectures 
without any weight information is inaccurate for evaluating fitness, and the evolution 
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would be very inefficient. In [99], an improved GA is used for training a three-layer FNN 
with switches at its links. Both the nonlinear mapping and the network architecture 
can be learned. The weights of the links govern the input-output mapping, while the 
switches of the links govern the network architecture. In the genetic backpropagation (G- 
Prop) method [100], the GA selects the initial weights and changes the number of 
neurons in the hidden layer through the application of five specific genetic operators, 
namely, mutation, multipoint crossover, addition, elimination and substitution of hidden 
units. The BP, on the other hand, is used to train fi:om these weights. This makes a clean 
division between global and local search. This strategy attempts to avoid Lamarckism. 
Behaviors between parents and their offspring are linked by various mutations, such as 
partial training and node splitting. The evolved neural network is parsimonious by 
preferring the node/connection deletion operations to the node/connection addition 
operations. The hybrid training operator that consists of a modified BP with adaptive 
learning rates and the S A is used for modifying the connection weights. After the 
evolution, the best evolved neural network is further trained using the modified BP on the 
combined training and validation set. 

There are also studies on evolving node activation functions or learning rules, since 
different activation functions have different properties and different learning rules have 
different performance [96]. For example, the learmng rate and the momentum factor of 
the BP algorithm can be evolved [101], and learning rules evolved to generate new 
learning rules [102]. EAs are also used to select proper input variables for neural 
networks from a raw data space of a large dimension, that is, to evolve input features. 
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The activation functions can be evolved by selecting among some popular nonlinear 
functions such as the Heaviside, sigmoidal, and Gaussian functions. A neural network 
with evolutionary neurons has been proposed in [ 103 ]. 

3.9 Conclusion 

In this chapter we have explored the different aspects of soft computing methods. The 
potential & applicability of the soft computing techniques were discussed in detail. The 
discussion has started from the artificial neural network architecture and their various 
learning algorithms in order to train the network for specified task. The backpropagation 
algorithm has been discussed with its capabilities and limitations. The different 
modifications of backpropagation algorithm proposed by the researchers were also 
discussed in detail. . 

In the continuation the fuzzy logic, fuzzy rules, fuzzy inference system has been 
discussed for accomplish the task of pattern recognition with its ability to deal with 
impreciseness, incompleteness and uncertainty in the input stimuli. The advance fuzzy 
systems are also explored like complex fuzzy system. The discussion proceeds next with 
the evolutionary algorithm. The immune algorithm & ant colony optimization has also 
been discussed as the new trends of evolutionary learning to determine the global optimal 
solution. In the last the hybrid evolutionary architecture and algorithm has explained. The 
various methods which incorporate the genetic algorithm with neural networks have 
discussed with their advantages & efficiency in order to find the global optimum solution. 
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CHAPTER 4: Analysis of Pattern Classification for the Hand Written English 
Alphabets with Soft Computing Approach. 

Abstract 

This chapter describes the analysis of pattern classification for hand written English 
alphabets with feed forward neural network. The neural network with the two hybrid 
evolutionary algorithm (EAs) is used to analyze the performance for the given task. In 
this present work we describe the simulation of two hybrid evolutionary algorithms to the 
feed forward neural network used in classification of the hand written English alphabets. 
Besides backpropagation algorithm, the soft computing approaches in the form of 
random genetic algorithm and hybrid evolutionary algorithm have also been considered. 
The objective is to analyze the performance of soft computing method over other 
discussed algorithms for the given task. In order to accomplish this task the experiments 
considered the two different feed forward neurail networks trained for the five numbers of 
trials with hybrid evolutionary algorithm and random genetic algorithm. Each of these 
algorithms has been taken the definite lead on the conventional approaches of neural 
network for pattern recognition. It has been analyzed that the feed forward neural 
network with the genetic algorithm makes better generalization accuracy in character 
recognition problems. The problem of not convergence the weights in conventional back- 
propagation has also eliminated by using the soft computing techniques. It has been 
observed that, there are more then one converge weight matrix in character recognition 
for every training set. The results of the experiments show that the hybrid evolutionary 

algorithms can solve challenging problem most reliably and efficiently. 
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4.1 Introduction 

Pattern recognition aims to classify data (patterns) based on either a priori knowledge or 
on statistical information extracted from the patterns. The patterns to be classified are 
usually groups of measurements or observations, defining points in an appropriate 
multidimensional space. A complete pattern recognition system consists of a sensor that 
gathers the observations to be classified or described; a feature extraction mechanism that 
computes numeric or symbolic information from the observations; and a classification or 
description scheme that does the actual job of classifying or describing observations, 
relying on the extracted features [1]. 

Character recognition plays an important role in today’s life [2]. It can solve many 
complex problems in of real life. An example of character recognition is Handwritten 
English alphabets. The classic difficulty of being able to correctly recognize even typed 
optical language symbols is the complex irregularity among pictorial representations of 
the same character due to variations in fonts, styles and size. This irregularity 
undoubtedly widens when one deals with handwritten characters [3]. Classification 
method designs are based on the following concepts: 

Member-roster concept: Under this template-matching concept, a set of patterns 
belonging to a same pattern is stored in a classification system. When an unknown 
pattern is given as input, it is compared with existing patterns and placed under the 
matching pattern class. Common property concept: In this concept, the common 
properties of patterns are stored in a classification system. When an unknown pattern 
comes inside, the system checks its extracted common property against the common 
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properties of existing classes and places the pattem/object under a class, which has 
similar, common properties. Clustering concept'. Here, the patterns of the targeted classes 
are represented in vectors whose components are real numbers. So, using its clustering 
properties, we can easily classify the unknown pattern. If the target vectors are far apart 
in geometrical arrangement, it is easy to classify the unknown patterns. If they are nearby 
or if there is any overlap in the cluster arrangement, we need more complex algorithms to 
classify the unknown patterns [4-5]. 

Bayesian decision theory is a system that minimizes the classification error. Bayesian 
decision theory has a conceptual clarity leading to an elegant numerical recipe. This 
algorithm can deal with a broader scope of stochastic models than classical algorithms 
[6]. Nearest neighbor rule [7] is used to classify the handwritten characters. The distance 
measured between two character images is needed to use this rule. This algorithm works 
well when the target patterns are far apart. Training in the nearest neighbor rule is very 
fast. Linear classification or discrimination [5] deals with assigning a new point in a 
vector space to a class separated by a boundary. It is well suited to mixed data types. It 
can also handle non-linear cases and missing data. The results produced by the system are 
very easy to interpret. 

All the methods mentioned above have their limitations such as bayesian decision theory 
has computational difficulties .This mans that the method has a difficulty filling in 
numerical details. The method also has an obligation to use prior information, if 
unavailable the theory will not work properly. The basic problem with the nearest 
neighbor rule is it requires the large set of data and the query time is very slow. Because 
of the larger set of data it is very much prone to data error. Therefore if there were any 
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irrelevant information entered into the system the system will be easily misinterpret the 
results. The same problem of larger data set occurs with the linear classification method 
[ 8 ]. 

All these aforesaid tasks can easily be accomplished by a human being without involving 
much effort due to its complex structure and working in parallel mechanism. The 
stmcture of biological neural network [9] has been simulated and modeled in a serial 
fashion that provides parallelism through ANN. One of the important advantages of ANN 
is its adaptive nature [10] and due to this property many existing paradigms can be fused 
into it easily. The powerful attribute of neural network is the ability to leam arbitrary 
non-linear mapping using one of the appropriate learning rules. Once the ANN system 
has trained, it can use for the pattern classification [11], pattern association, pattern 
mapping, pattern grouping [11], and feature mapping pattern and optimization control 
etc. Accomplish the task of pattern classification & pattern mapping the supervised 
multilayer feed forward neural network [12,13] is considered with non-linear 
differentiable function in all processing units of output and hidden layers. The number of 
processing units in the input layer, corresponds to the dimensionalities of the input 
pattern, are linear. The number of output units corresponds to the number of distinct 
classes in the pattern classification. A method has been developed [14], so that network 
can be trained to capture the mapping explicitly in the set of input output pattern pair 
collected during an experiment and simultaneously expected to modal the unknovTO 
system for function from which the predictions can be made for the new or untrained set 
of data. The possible output pattern class would be approximately an interpolated version 
of the output pattern class corresponding to the input learning pattern close to the gi ven 
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test input pattern. This method involves the back propagation-learning rule [15] based on 

the principle of gradient descent along the error surface in the weight space. This 
algorithm is used for the training of a supervised multi-layer feed forward neural 
network, so that the network could be trained to capture the missing implicit pattern and 
generate the classification for different features in the given set of input-output pattern 
pairs. 

Efficient learning by the back-propagation (BP) algorithm is required for many practical 
applications [16]. The BP algorithm calculates the weight changes of artificial neural 
networks, and a common approach is to use a two-term algorithm consisting of a learning 
rate (LR) and a momentum factor (MF). The major drawbacks of the two-term BP 
learning algorithm are the problems of local minima and slow convergence speeds, which 
limit the scope for real-time applications [17]. A local minimum is defined as a point 
such that all points in a neighborhood have an error value greater than or equal to the 
error value in that point. [18]. However, GA is particularly good to perform efficient 
searching in large and complex space to find out the global optima and for the 
convergence. As the complexity of the search space increases, GA presents an 
increasingly attractive alternative to gradient based techniques to solve many practical 
problems [19]. 

The proposed experiments demonstrate that for the given set of problem (recognition of 
English hand written alphabets) the performance of hybrid evolutionary feed forward 
neural network is better than the other algorithm discussed in terms of the accuracy and 
rate of convergence. We found the significant difference in accuracy and the rate of 
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convergence of hybrid evolutionary algorithm (hybrid genetic algorithm) with the simple 
back propagation algorithm and random genetic algorithm (random search algorithm) for 
the given set of the problem. 

Section two of this chapter describes the generalized approaches and principals of the 
algorithms applied for the problem. Section three describes the architecture and design of 
the network used, in terms of simulation design. Section four shows the results of the 
experiments. Section five describes the discussions and the future work based on the 
results, while in the last section conclusion and summary of the paper is presented. 

4,2 Supervised Feed Forward Neural Network 

Feed forward neural network is a biologically inspired classification algorithm. It consists 
of a number of simple neuron-like processing units, organized in layers. Every unit in a 
layer is connected with all the units in the previous layer. These connections are not all 
equal; each connection may have a different strength or weig/zt. The weights on these 
connections encode the knowledge of a network. Often the units in a neural network are 
also called nodes. Data enters at the inputs and passes through the network, layer by 
layer, until it arrives at the outputs. During normal operation, that is when it acts as a 
classifier, there is no feedback between layers. This is why they are cdifiQd feed forward 
neural networks. The neural network consists of an input layer of nodes, one or more 
hidden layers, and an output layer. Each node in the layer has one corresponding node in 
the next layer, thus creating the stacking effect. The input layer s nodes have output 
functions that deliver data to the first hidden layer nodes. The hidden layer(s) are the 
processing layer, where all of the actual computation takes place. Each node in a hidden 
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layer computes a sum based on its input from the previous layer (either the input layer or 
another hidden layer). The sum is then "compacted" by a sigmoid function (a logistic 
curve), which changes the sum to a limited and manageable range. The output sum from 
the hidden layers is passed on to the output layer, which produces the final network result 
as shown in figure 4.1. Feed-forward networks may contain any number of hidden layers, 
network with a single hidden layer can learn any set of training data that a network with 
multiple layers can learn, depends upon the complexity of the problem [20]. However, 
neural network with a single hidden layer may take longer to train. 


Bias 



Figure 4.1: The functioning of neural network architecture. 

An input may be either a raw / preprocessed signal or image. Alternatively, some specific 
features can also be used. If specific features are used as input, their number and selection 
is crucial and application dependent. Weights are connected between an input and a 
s umming node. These affect to the summing operation. The quality of network can be 
seen from weights Bias is a constant input with certain weight. Usually the weights are 
randomized in the beginning [21,22]. 
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The neuron is the basic information processing unit of aNN. It consists of: A set of links, 
describing the neuron inputs, with weights, w^ Wj jWj w „ , An adder function (linear 

combiner) for computing the weighted sum as: 


m 

v=i;wjXj 

y=i 


(4.1) 


and activation function (squashing function) for limiting the amplitude of the neuron 
output as shown in figure 4.1. 

y = (p{Y + b) (4.2) 


where. 


V = y^fwjxj 
7=0 
i = wo 


(4.3) 


The output at every node can finally calculates by using sigmoid function 


y = f{x) 


1 


(4.4) 


1 + e-^ 
where k = 1 (constant) 

The Feed Forward Neural Network uses a swpervMcJ learning algorithm: besides the 
input pattern, the neural net also needs to know to what category the pattern belongs. 
Learning proceeds as follows: a pattern is presented at the inputs. The pattern will be 
transformed in its passage through the layers of the network until it reaches the output 
layer. The units in the output layer all belong to a different category. The outputs of the 
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network as they are now compared with the outputs as they ideally would have been if 
this pattern were correctly classified, in the latter case the unit with the correct category 
would have had the largest output value and the output values of the other output units 
would have been very small. On the basis of this comparison all the connection weights 
are modified a little bit to guarantee that, the next time this same pattern is presented at 
the inputs, the value of the output unit that corresponds with the correct category is a little 
bit higher than it is now and that, at the same time, the output values of all the other 
incorrect outputs are a little bit lower than they are now. The differences between the 
actual outputs and the idealized outputs are propagated back from the top layer to lower 
layers to be used at these layers to modify connection weights. This is why the term back- 
propagation network is also often used to describe this type of neural network. 

A genetic algorithm (GA) [23] is a programming technique that mimics biological 
evolution as a problem-solving strategy. Given a specific problem to solve, the input to 
the GA is a set of potential solutions to that problem, encoded in some fashion, and a 
metric called a. fitness function ihal allows each candidate to be quantitatively evaluated. 
These candidates may be solutions already known to work, with the aim of the GA being 
to improve them, but more often they are generated at random. 

The GA then evaluates each candidate according to the fitness function. In a pool of 
randomly generated candidates, of course, most will not work at all, and these will be 
deleted. However, purely by chance, a few may hold promise - they may show activity, 
even if only weak and im perfect activity, toward solving the problem. GA differs fi'om 
more traditional optimization techniques in that they involve a search from a population 
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of solutions, not from a single point. Each iterations of GA involves a competitive 
selection that weeds out poor solutions. The solutions with high "fitness" are 
"recombined" with other solutions by swapping parts of a solution with another. 
Solutions are also "mutated" by making a small change to a single element of the 
solution. Recombination and mutation are used to generate new solutions that are biased 
towards regions of the space for which good solutions have already been seen [ 19 , 20 ]. 

A population of individuals is maintained within search space for a GA, each 
representing a possible solution to a given problem. Each individual is coded as a finite 
length vector of components, or variables, in terms of some alphabet, usually the binary 
alphabet { 0 , 1 }. To continue the genetic analogy these individuals are likened to 
chromosomes and the variables are analogous to genes. Thus a chromosome (solution) is 
composed of several genes (variables). A fitness score is assigned to each solution 
representing the abilities of an individual to 'compete'. The individual with the optimal 
(or generally near optimal) fitness score is sought. The GA aims to use selective 
breeding' of the solutions to produce 'offspring' better than the parents by combining 
information from the chromosomes. 

The GA maintains a population of n chromosomes (solutions) with associated fitness 
values. Parents are selected to mate, on the basis of their fitness, producing offspring via 
a reproductive plan. Consequently highly fit solutions are given more opportunities to 
reproduce, so that offspring inherit characteristics from each parent. As parents mate and 
produce offspring, room must be made for the new arrivals since the population is kept at 
a static size. Individuals in the population die and are replaced with a new solutions, 
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eventually creating a new generation once all mating opportunities in the old population 
have been exhausted. In this way it is hoped that over successive generations better 
solutions will thrive while the least fit solutions die out. New generations of solutions are 
produced containing, on average, better genes than a typical solution in a previous 
generation. Each successive generation will contain more good 'partial solutions' than 
previous generations. Eventually, once the population has converged and is not producing 
offspring noticeably different from those in previous generations, the algorithm itself is 
said to have converged to a set of solutions to the problem at hand. 

After an initial population is randomly generated, the algorithm evolves through the three 
operators: 

1 . selection which equates to survival of the fittest; 

2. crossover which represents mating between individuals. 

3. mutation which introduces random modifications. 

Selection Operator gives preference to better individuals, allowing them to pass on their 
genes to the next generation. The goodness of each individual depends on its fitness. 
Fitness may be determined by an objective function or by a subjective judgment. 

Crossover Operator is a prime distinguished factor of GA from other optimization 
techniques. Two individuals are chosen from the population using the selection operator. 
A crossover site along the bit strings is randomly chosen and the values of the two strings 
are exchanged as if 81=000000 and 82=11011 and the crossover point is 2 then 
81=110000 and 82=001 1 Il .The two new offspring created from this mating are put into 
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the next generation of the population .By recombining portions of good individuals, this 
process is likely to create even better individuals. 

Mutation in living things changes one gene to another, so mutation in a genetic algorithm 
causes small alterations at single points in an individual's code. With some low 
probability, of a portion of the new individuals will have some of their bits flipped. Its 
purpose is to maintain diversity within the population and inhibit premature convergence. 
Mutation alone induces a random walk through the search space Mutation and selection 
(without crossover) creates a parallel, noise-tolerant, hill-climbing algorithms. 

Popular working -out algorithms for feed forward neural networks such as back- 
propagation algorithm undergo the intrinsic complications of gradient -decent 
techniques-predominantly local minima in the error function. GA propose an another 
solution to conventional techniques since they do not rely on gradient information -they 
can sample the search space irrespective of where the existing solution is to be found , 
while remaining is biased toward good solutions. 

Lots of work has been already done on the evolution of neural network with hybrid GA 
[19, 20].The majority of implementations of the GAs are derivatives of Holland’s[24] 
innovative specification. Evolution has been introduced in NNs at three levels, 
connection weights, architectures and learning rules. The evolution rules have not yet 
been subjected to a similar study, but the literature on the subject is mounting. The 
evolution of a network’s connection weights is an area of curiosity and the center of 
attention of this manuscript. The GA is used in the feed forward neural network for 
evolving the population of the weights is evaluated from the fitness evaluation function. 
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The least mean square error function is used for the evaluation of individual weight 
population. The fittest weights are used for further computation and participate in the 
classification. This process will continue until the required classification is achieved. The 
final selected weights represent the optimized strength of the connection in the network 
architecture, which is suitable for the classification of all presented patterns. This strength 
represents the convergence of the weights for the desired classification. There is also a 
possibility that more than one population of weights is generating the correct 
classification for every presented training pattern. [8, 9]. 

4.3 Simulation Design and Implementation Details 

We have considered two different architecture first consisting 4 neurons in the input 
layer, 6 neurons in each hidden layers, 2 hidden layers and 5 neurons in the output layer, 
while the second architecture consists 4 neurons in the input layer, 5 neurons in each 

hidden layers, 2 hidden layers and 5 neurons in the output layer. 

43.1 Experiment 

Five different sets of alphabets are used for the both experiment. The alphabets are 
converted into their density function by using MATLAB program, for input data. We 
used three different learning algorithms: backpropagation algorithm, random genetic 
algori thm ( hybrid random search algorithm), and hybrid evolutionary algorithm( hybrid 
genetic algorithm). In the each experiment we have taken five trials to train the neural 

network population through three different algorithms. 
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The genetic operators used in each experiment are summarized in Table 4. 1 : 
Table 4.1: Genetic operators used in the experiments 


Training Algorithms 

Genetic Operators Used 

Back-propagation 

None 

GA 

Mutation with probability <=0.1 and 

Crossover 

Hybrid EA 

Mutation with probability <= 0.1 and 

Crossover 


The parameters used in all three experiments are listed in Table 4..2 and 4.. 3. 
Table 4.2: Parameters used for Back propagation Algorithm 


Parameter 

Value 

Back-propagation learning Rate irf) 

0.1 

Momentum Term (cr) 

0.9 

Doug’s Momentum Term 

f M 
li-wJ 

( 1 V 

U-WJ 

Adaptation Rate (j^) 

3.0 

Minimum Error Exist in the Network 

(maxe) 

0.00001 

Initial weights and biased term values 

Randomly Generated 

Values Between 0 and 1 
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Table 4.3: Parameters used for Genetic Algorithm and Hybrid Evolutionary Algorithm 


Parameter 

Value 

Adaptation Rate (iC) 

3.0 

Back-propagation learning Rate (;;) 

0.1 

Momentum Descent Term (a) 

0.9 

( 1 ] 

Doug’s Momentum Term — 

U-wJ 

f ^ 1 

U-WJ 

Mutation Population Size 

3 

Crossover Population Size 

2000 

Minimum Error Exist in the Network 

{maxe) 

0.00001 

Initial weights and biased term values 

Randomly Generated 

Values Between 0 and 1 


In all experiments, the neural networks were trained to generate the appropriate 
classification for the handwritten English alphabets. For this, the scaimed images of five 
different samples of handwritten English alphabets (Fig. 4.2) were obtained. 
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Figure 4.2: Scanned images of five different samples of handwritten English alphabets 
All collected hand written alphabets images were partitioned into four equal parts, and 
the density values of the pixels for each part were calculated. Next, the density values of 
the central of gravities for these partitioned images were calculated. Consequently four 
values were obtained from an image of handwritten English alphabets, which were then 
used as the input for the feed-forward neural network. This procedure was used to present 
the input pattern to the feed-forward neural network for each of the sample of English 
Alphabets. 

4.3.2 The Neural Network Architecture 

The architecture of the neural networks was based on a fully connected feed-forward 
multilayer generalized perceptron. The hidden layers were used to investigate the effects 
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of the algorithms on the hyper plane. Each network had a single output unit with the 
following activation and output functions, 



H 

1 = 0 k 


i 



^ O 

where function J 

V /C 


is given as, 


(4.5) 


(4.6) 


0\ 


(4.7) 


1 + e 


■KA\ 


Now, similarly, the output and activation value for the neurons of hidden layers and input 
layer can be written as, 

N 


and 


I = 0 




-KAj 
1 + e ^ 


and of = J( I = ^ 


(4.8) 


(4.9) 


(4.10) 


'k-\-k)--k 

In the Back-propagation learning algorithm, the change in weight populations was done 
according to the calculated error in the network after each of the iteration of training. The 
change in weight and error in the network can be calculated as follows, 
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(4.12) 

\ P r r. ^2 
^ p =i ' 

where -T^ is the squared difference between the actual output value of the output 

layer for pattern p and the target output value. Doug’s momentum term [3] was used 
with momentum descent term for calculating the change in weights in equations 4.11 and 
4.12. Doug's momentum descent is similar to standard momentum descent with the 
exception that the pre-momentum weight step vector is bounded so that its length cannot 
exceed 1.0. After the momentum is added, the length of the resulting weight change 
vector can grow as high as 1 / (1 - momentum). This change allows stable behavior with 
much higher initial learning rates, resulting in less need to adjust the learning rate as the 
training progresses. The evolutionary algorithms evolve the population of weights using 
its operators, and select the best population of the weights that minimize the error 
between the desired output and the actual output of neural network system. 

4.4 The Genetic Algorithm Implementation 

Figure 4.3 shows the standard form of a genetic algorithm. The initial population was 
generated with randomly assigned values for weights and biases. The values were 
obtained from the random generator generating values between 0 and 1. 
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Figure 4 . 3 : Flowchart of Genetic Algorithm Implementation 

4.4.1 Genetic Representation 

Aft. .he initial population of weights and biases was cteated. dte ptobien. dontain is 
teptesent^ as a chnontosonte. For «,ft.e set of weight values is represented as amaftrx 

(Table4.4). 
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Each row of this matrix represents a group of incoming weighted links to a single 
neuron. In total, there are 90 weighted links between neurons and 1 7 biased values of 
neurons in the neural network [4-6-6-5] while in the network [4-5-5-5] there are 70 
weighted links between neurons and 15 biased values of neurons. Thus, a chromosome is 
a collection of genes representing either a weight or a biased value. Where some gene 
corresponds to a single weighted link and some corresponds to biased values of neurons 
in the network. 

4.42 The Mutation Operator 

A mutation operator randomly selects a gene in a chromosome and adds a small random 
value between -1 and 1 to that particular gene will produce next generation population of 
107-gene chromosomes of [4-6-6-5] network and 85-gene chromosomes of [4-5-5-5] 
network. The size of next generated population will be of size «+l (to meet the elitism), 
if the mutation operator has applied n times over the old chromosome. Then we have, 


Cne^=C0ld\j + ^ 


(4.14) 


where symbolizes the old chromosome of 107-gene or 85-gene, 6 symbolizes the 

small randomly generated value between —1 to 1 , /I symbolizes the randomly selected 

gene of chromosome for adding the € and symbolizes the next generation 

population of chromosome, i.e. The imiQt 
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operator prepares a new chromosome at each iteration of mutation and outer 
IJ operator is building the new population of chromosomes called C"™ . 

4.4.3 Elitism 

Elitism was used when creating each generation so that the genetic operators did not lose 
good solutions. This involved copying the best-encoded network unchanged into the new 

population , which includes for creating . 

4.4.4 Selection 

This will select a chromosome among the mutated population of chromosomes for 
which the sum of squared errors is minimum for the feedforward neural network, i.e. 
iteratively all the chromosomes values will be assigned to the network architecture in 
terms of weights and biased values defined in chromosome. After assigning the values, 
the each network architecture, will be able to fabricate output using these assigned 

values. For each chromosome of , the error can be calculated using these fabricated 

outputs. Now, the selection operator will pick a chromosome from , which 
generates minimized error for the network. 

4.4.5 Crossover 

The crossover operator takes selected chromosome and creates a child for producing the 
next generation population of 107-gene or 85-gene chromosomes of size n + l . This next 
generation of population is getting by applying the crossover operator n times. 
Experiments performed the crossover by Swapping the two randomly selected gene 
values of the parent chromosome as given in figure 4.4(a), using, 
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Qnext y 

/ = 1 




Qsel _Qsel _Qsel 
a p j 


U 


a 


sel 


■>C„ 

v _ p 




a p 


(4.15) 


where a wA p symbolize the randomly generated genes positions in CP^ and CP^ in 
C' chromosome and C"“' is the next generation of size n + \. 


Chromosome 


sel 


Figure 4.4(A) 


Chromosome 


Figure 4.4(B) 


Chromosome 



^2 

^3 

^4 

^5 

^6 

^7 

1^8 


Crossed 


er Point 



As 


Figure 4.4(C) 

Figure 4.4: (A) Chromosome before applying crossover operator, (B) Applying 
crossover operator on chromosome, and (C) Chromosome after applying crossover 
operator 

4.4.6 Fitness Evaluation Function 

This is to define a fitness function for evaluating the chromosome’s performance. This 
function must estimate the performance of the weight population for a given feedforward 


jfiiB 
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neural network. We apply here a practically simple function defined by the proportional 
of the sum of squared errors. To evaluate the fitness of a given chromosome, each weight 
and biased value contained in the chromosome is assigned to the respective link and 
neuron in the network. The training set is then presented to the network, and the sum of 
squared errors is calculated. The smaller the sum, the fitter the chromosome. In other 
words, the genetic algorithm attempts to find a set of weights and biased values that 
minimizes the sum of squared errors as, 

minerror = 1.0 

For all the n+1 chromosomes 

if (min error > j then 

(min error = j 

^min __ Qmxt 

else 

(minerror = min error) (4.16) 

where symbolizes the error calculated for i* chromosome among the n + \ 

chromosomes of population, C"” symbolizes the chromosome, which has 

minimized error. 

4.5 Results and Discussion 

The results shown below, consist 22 tables (from Table 4.5 to Table 4.26) having entries 
for iterations and count of convergence weight matrix and 22 graphs (from Figure 4.5 to 
Figure 4.26). Tables 4.5 [a, b], 4.7 (a, b], 4.9 [a. b], 4.11 [a, b], 4.13 [a, b], 4.15 [a, b). 
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4.17 [a, b], 4.19 [a, b], 4.21[a, b], 4.23[a, b], contains the integer value for number of 
iterations performed by each algorithms to recognize the five samples of each hand 
written English alphabets, while the real values in the tables show that the error exists 
upto 50000 iterations , means algorithm could not recognize the given sample of 
alphabets . Tables 4.6 [a, b], 4.8 [a, b], 4.10 [a, b], 4.12 [a, b], 4.14 [a, b], 4.16 [a, b], 4.18 
[a, b], 4.20 [a, b], 4.22[a, b], 4.24[a, b] contains the value for number of convergence 
weight matrix obtained to recognize the given samples by each algorithm, the entry for 
the count of convergence weight matrix has not shovra for back-propagation algorithm 
because it could not converse the final results in most of the cases. The graphs 
[Figure 4.5 to 4.24] show the comparisons of the performance of different algorithms 
based on the results obtained. The network mentioned as first network consisting one 
input layer with 4 neurons, two hidden layers with 5 neurons and one output layer with 5 
neurons[4-5-5-5] while the other network mentioned as second network consisting one 
input layer with 4 neurons, two hidden layers with 6 neurons and one output layer with 5 
neurons[4-6-6-5]. The graphs [Figure 4.5, 4.7,4.9,4.11,4.13,4.15,4.17,4.19,4.21,4.23] are 
plotted between the average iterations of five samples of any alphabet performed by the 
different algorithms for each trials, the value 50001 has been taken for iteration where the 
algorithm could not converged the solution for any sample. While the graphs 
[Figure 4.6, 4.8, 4.10, 4.12, 4.14, 4.16, 4.18, 4.20, 4.22, 4.24] plotted between the average 
number of convergence weight matrix for five samples of alphabets. 
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Table 4.5(a): Results of first trial of first network [4-5-5-5J 


Alphabets 

A 

B 

C 

D 

E 

F 

G 

H 

I 

J 

K 

L 

M 


Iterations / error calculated by Back-Propagation Algorithm 

Sample 1 

0.3 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 2 

0.3 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 3 

0.3 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 4 

0.4 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 5 

0.4 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Iterations / error calculated by Genetic Algorithm 

Sample 1 

884 

953 

2 

42 

1694 

3 

5 

12 

74 

. 

57 

262 

514 

441 

Sample 2 

1 

2 

1 

1 

2 

1 

1 

1 

1 

2 

2 

2 

2 

Sample 3 

1 

2 

469 

1 

2 

1768 

1 

1 

72 

2 

2 

56 

68 


1 

2 

■ 

|B 

2 

2 

■ . 

■ 


■ 

4 

1 

5 

864 

[m 

Sample 5 

1 

2 

1 

1 

2 



■ 

3 ■ 

1 

1 

! 

1 

Iterations / error calculated Hybrid Evolutionary Algorithm 

Sample 1 

54 

488 

24 

2428 

1 

0.1 

0.3 

0.1 

0,2 

0.3 

0.3 

0.2 

0.2 

Sample 2 

1 

1 

120 

1 

1 

0.1 

0.2 

0.2 

0.2 

0.3 

0.3 

0.2 

0.2 

Sample 3 

1 

1 

1 



. 

1 

0.2 

0,2 

0.2 

0.2 

0.2 

0.3 

0.2 

0.2 

Sample 4 

1 

1 

1 

1 

1 

0.2 

0.2 

0.2 

0.2 

0.2 

0.3 

0.2 

0.2 

Sample 5 

1 

1 

1 

1 

1 

0.2 

0.2 

0.2 

0.3 

0.2 

0.3 

0.2 

0.2 
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Table 4.5(b): Results of first trial for first network f4-5.5.sj. 


Alphabets 

N 

0 

p 

Q 

R 

S 

T 

U 

V 

w 

X 

Y 

z 

Iterations /trror calculated by Back-Propagation Algorith 

n 


Sample 1 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 2 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Samples 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 4 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 5 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Iterations /Error calculated by Genetic Algorithm 

Sample 1 

175 

1 

2806 

54 

16 

161 

6 

11 

865 

35 

210 

818 

292 

Sample 2 

1 

1 

609 

2 

1 

2 

313 

1821 

2 

2 

123 

112 

2 

Sample 3 

1 

■ 

2 

■ 

■ 

35 

■ 

■ 

■ 



2. 

1166 

Sample 4 

■1 

1 

2 

1 

■ 

■ 

■ 



2 

4 

■ 

2 

Sample 5 

1 

1 

1 

1 

1 

1 

1 

18 

4 

2 

111 

30 

2 

Iterations /Error calculated for Hybrid Evolutionary Algorithm 

Sample 1 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.1 

0.3 

0,1 

0.2 

0.3 

0.3 

0.2 

Sample 2 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.1 

0.2 

0.2 

0.2 

0.3 

0.3 

0.2 

Sample 3 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.2 

0.2 

0.2 

0.2 

0.2 

0.3 

0.2 

Sample 4 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.2 

0.2 

0.2 

0.2 

0.2 

0.3 

0.2 

Sample 5 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.2 

0.2 

0.2 

0.3 

0.2 

0.3 

0.2 
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Figure 4.5: The Comparison Chart for iterations for recognition of handwritten 
English alphabets for first trial of simulation for network [4-5-5-51, 



Table 4.6(a): Results of first trial for first network [4-5-S-5]. 


Alphabets 

A 

B 

C 

D 

E 

F 

G 

H 

I 

J 

K 

L 

M 

Number of convergence weight matrix for Genetic Algorithm 

Sample 1 

1220 

1130 

1 

1115 

. 281 

3 

■ 

53 

11 

1477 

1326 

1582 

1370 

1682 

Sample 2 

1026 

1094 

1 

1122 

1477 

r— y 

60 

12 

1535 

146 

1529 

1349 

1687 

Sample 3 

994 

1162 

1300 

1127 

317 

1499 

202 

10 

1422 

1197 

1565 

168 

1540 

Sample 4 

172 

1063 

1152 

1079 

103 

1542 

1446 

7 

1473 

1157 

1572 

1321 

1501 

Sample 5 

6 

857 

1177 

1104 

1083 

1494 

110 

7 

1574 

1159 

1604 

1344 

1330 


Num 

ber of 

convert 

rence weight matrix for Hybrid Evolutionary Algorithm 

Sample 1 

1739 

1823 

1785 

1882 

5 

The algorithm could not converge to the solution after 

alphabet ‘E’ so no convergence weight matrix 

obtained for remaining alphabets. 

_ - ^ ■ — - ■ ■' . 

Sample 2 

1698 

1888 

1839 

1895 

mr 

Sample 3 

1692 

141 

175"? 

1904 

1731 

Sample 4 

1674 

1875 

1767 

1891 

1734 

Sample 5 

1687 

1879 " 

1733 

1892 

1708 
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Table 4.6(b): Results of first trial of for first network [4-5-5-5f 


Jphabets 

N 

0 

P 

Q 

R 

S 

T 

U 

V 

w 

X 

Y 

Z 

Number of convergence weight matrix for Genetic Algorithm 

Sample 1 

1558 

4 

1408 

989 

12 1 

20 

7 

10 

1286 

1434 

58 

1505 

1473 

Sample 2 

189 

1612 

25 

147 

7 

1318 

1394 

1600 

1301 

1421 

1355 

1639 

1438 

Sample 3 

1546^ 

1460 

16 

1334 

11 

1478 

1301 

1591 

1466 

1554 

1345 

1727 

1410 

Sample 4 

1402 

1534 

17 

1098 

9 

1311 

1443 

1580 

1511 

1547 

1380 

1491 

1417 

Sample 5 

1498 

1430 

1390 

1310 

11 

1468 

1349 

1748 

1546 

1597 

997 

1421 

1248 


Number of convergence weight matrix for Hybrid Evolutionary Algorithm 


Sample 1 


Sample 2 • • 

The algorithm could not converge to the solution after alphabet ‘E’ so no convergence weight matrix 

Samples 

obtained for remaining alphabets. 

Sample 4 


Sample 5 


Figure 4.6: The Comparison Chart for convergence weight matrices for recognition of 
handwritten English alphabets for first trial of simulation for network [4-5-5-51. 
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Table 4. 7(a): Results of second trial of first network [4-5-5-5J. 


Alphabets 

A 

B 

C 

D 

E 

F 

G 

H 

I 

J 

K 

L 

M, 


Iterations/ err or calculated for Back^Propagation Ali 

gorit? 

m 



Sample 1 

0.1 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

02 

Sample 2 

0.3 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 3 

0.3 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 4 

0.4 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 5 

0.4 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Iterations / error calculated for Genetic A 

Igorithm 

Sample 1 

891 

12 

198 

4 

368 

5 

3 

46 

1091 

51 

538 

1182 

2 

Sample 2 

1 

1 

1 

1 

2 

1 

1 

1 

47 

1 

2 

1 

2 

Sample 3 

1 

1 

1 

1 

2 

1 

1 

1 

2 

1 

2 

1 

7 

Sample 4 

1 

1 

1 

1 

2 

1 

1 

1 

2 

1 

2 

1 

1 

Sample 5 

1 

81 

1 

1 

2418 

1 

195 

1 

8 

4 

' 

1 

1 

4 

Iterations / error calculated Hybrid Evolutionary Algorithm 

Sample 1 

25 

670 

928 

378 

0.3 

0.1 

0.3 

0.1 

0.2 

0.3 

0.3 

0.2 

0.2 

Sample 2 

1 

1 

2 

0.1 

0.2 

0.1 

0.2 

0.2 

0.2 

0.3 

0.3 

0.2 

0.2 

Sample 3 

1 

1 

90 

0.2 

0.2 

0.2 

0.2 

0.2 

0.2 

0.2 

0.3 

0.2 

0.2 

Sample 4 

1 

1 

209 

0.2 

0.2 

0.2 

0.2 

0.2 

0.2 

0.2 

0.3 

0.2 

0.2 

Sample 5 

1 

1 

1 

0.2 

0.2 

0.2 

0.2 

0.2 

0.3 

0.2 

0.3 

0.2 

0.2 
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Table 4. 7(b): Results of second trial of first network f4-5-5-5J. 


Alphabets 

N 

0 

p 

Q 

R 

S 

T 

U 

V 

w 

X 

Y 

z 

Iterations /Error calculated for Back-^Propagation Algorithm 




Sample 1 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 2 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Samples 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 4 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 5 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Iterations /Error calculated for Genetic Algorithm 

Sample 1 

2 

252 

2274 

572 

708 

399 

6 

1313 

1 

1126 

163 

306 

320 

Sample 2 

1697 

1 

159 

1 

1 

1 

1 

2 

1 

2 

2 

1 

2 

Sample 3 

23 

2 

2 

20 

1 

1 

6 

2 

1 

2 

990 

1 

3 

Sample 4 

12 

124 

1 

1 

1 

1 

1 

2 

1 

2 

90 

1 

2 

Sample 5 

2 

1 

1 

1 

1 

1 

11 

2 

1 

2 

1 

1 

2 

Iterations /Error calculated for Hybrid Evolutionary Algorithm 

Sample 1 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.1 

0.3 

0.1 

0.2 

0.3 

0.3 

0.2 

Sample 2 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.1 

0.2 

0.2 

0.2 

0.3 

0.3 

0.2 

Samples 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.2 

0.2 

0.2 

0.2 

0.2 

0.3 

0.2 

Sample 4 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.2 

0.2 

0.2 

0.2 

0.2 

0.3 

0.2 

Sample 5 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.2 

0.2 

0.2 

0.3 

0.2 

0.3 

0.2 
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Figure 4.7: The Comparison Chart for iterations for recognition of handwritten English 
alphabets for second trial of simulation for network [4-5-5-51. 



Table 4.8(a): Results second trial of for first network 14-5-5-5 J. 


Alphabets 

A 

B 

C 

D 

E 

F 

G 

H 

I 

J 

K 

L 

M 

Number of convergence weight matrix for Genetic Algorithm 

Sample 1 

1383 

9 

891 

1 

17 

1 

11 

1328 

1479 

1426 

1480 

1417 


Sample 2 

1121 

7 

1042 

4 

1201 

3 

1445 

1201 

1405 

1492 

1384 

1094 

1366 

Sample 3 

1063 

4 

982 

10 

1271 

10 

1238 

1251 

1520 

1403 

125 

1256 

1099 

Sample 4 

1238 

10 

989 

8 

1344 

7 

1365 

1167^ 

1463 

1514 

1352 

1294 

1238 

Sample 5 

25 

889 

960 

9 

1353 

8 

1377^ 

1335 

1428 

1469 

1192 

1308 

43 


Nun 

iber of ( 

:onverg 

ence weight matrix for Hybrid Evolutionary Algorithm 

Sample 1 

4 

1845 

1601 

172^ 

The algorithm could not converge to the solution after 
alphabet ‘D’ so no convergence weight matrix obtained 

for remaining alphabets. 

Sample 2 

176 

1833 

1735 


Sample 3 

1579 

1832 

1586 


Sample 4 

43 

1823 

1595 


Samples 

45 

1893 

1621 
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Table 4.8(b): Results second trial of for first network [4-5-5-5J. 


phabets 

N 

0 

1 

1 P 

1 

Q 

R 

1 

S 

T 

U 

V 

w 

X 

Y 

z 1 

! 

Number of convergence weight matrix for Genetic Algorithm 

Sample 1 

1 

1706 

1331 

1582 

1466 

1392 

5 

1721 

I 

1503 

1518 

1420 

207 

Sample 2 

143 

1694 

1107 

1479 

1096 

1240 

11 

1718 

1 

1467 

1546 

1418 

513 

Sample 3 

1413 

1733 

1065 

1488 

1341 

1377 

1026 

1647 

3 

1376 

1222 

1492 

1467 

Sample 4 

1775 

1677 

1382 

1620 

1172 

1167 

161 

1691 

13 

1360 

1371 

1542 

1524 

Sample 5 

1818 

1727 

1231 

1433 

1364 

1368 

1471 

1704 

10 

1537 

1205 

1537 

1518 

Number of convergence weight matrix for Hybrid Evolutionary Algorithm 

Sample 1 

The algorithm could not converge to the solution after alphabet ‘D’ so no 

convergence weight matrix obtained for remaining alphabets. 

Sample 2 

Sample 3 

Sample 4 

Samples 


Figure 4.8: The Comparison Chart for convergence weight matrices for recognition of 
handwritten English alphabets for second trial of simulation for network [4-5-5-5]. 
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Table 4 . 9 (a): Results of third trial of first network [4-5.5.5J. 


Alphabets 


A B 


G Th 


J K 


Iterations/ error calculated for Back-Propagation Algorithm 


Sample 1 

0.4 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 0.2 

Sample 2 

0.4 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 0.2 

Sample 3 

0.4 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 0.2 

Sample 4 

0.4 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 0.2 

Sample 5 

0.4 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 0.2 


Iterations / error calculated for Genetic Algorithm 

Sample 1 | 2061 I 1186 1 SSH 1161 I 579 I 31 I 196 I 78 I 2883 I 912 I 86l 53^ 

Sample 2 1 1 USS 2 71 2 1 1 US I 978 3095 
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Table 4.9(b): Results of third trial for first network [4-5-5-5J. 


Alphabets 

N 

0 

p 

Q 

R 

S 

f 

u 

V 

w 

X 

Y 

z 

Iterations /Error calculated for Back-Propagation Algorithm 

Sample 1 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 2 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 3 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 4 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 5 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Iterations /Error calculated for Genetic Algorithm 

Sample 1 

2440 

580 

1219 

255 

6 

1 

993 

776 

398 

1 

221 

122 

3 

Sample 2 

11 

2 

31 

2 

73 

2 

2 

■ 

H 

2 

■ 



Samples 

2 

644 

2 

5419 

1 

1 

2 

■ 

■ 

■ 

■ 


■ 

Sample 4 

2 


180 

4 

1 

1 

2 

1 

9 

1 

1 

2 

2 

Sample 5 

2 

2 

2 

2 

1 

1 

12749 

1 

1 

1 

1 

4 

2 

Iterations /Error calculated for Hybrid Evolutionary Algorithm 

Sample 1 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.1 

0.3 

0.1 

0.2 

0.3 

0.3 

0.2 

Sample 2 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.1 

0.2 

0.2 

0.2 

0.3 

0.3 

0.2 

Sample 3 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.2 

0.2 

0.2 

0.2 

0.2 

0.3 

0.2 

Sample 4 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.2 

0.2 

0.2 

0.2 

02 

0.3 

0.2 

Sample 5 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.2 

0.2 

0.2 

0.3 

0.2 

0.3 

0.2 
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Table 4.10(b): Results of third trial of for first network [4-S-5-5J. 


Alphabets 

N 

0 

P 

Q 

R 

S 

T 

U 

V 

w 

X 

Y n 

z 1 

Number of convergence weight matrix for Genetic Algorithm 


Sample 1 

1434 

1606 

1415 

1332 

4 

2 

62 

1405 

95 

5 

1191 

1291 

9 

Sample 2 

1426n 

1628 

1429 

1346 

194 

10 

1290 

1405 

1357 

153 

1260 

1254 

1291 

Sample 3 

1532 

1612 

1376 

1421 

1221 

1400 

1261 

1336^ 

26" 

1364 

1264 

1338 

283 

Sample 4 

1533 

1647 

1326 

1356 

1300 

1404 

1379 

1508 

1539 

1104 

1218 

1403 

347 

Sample 5 

1530 

1650 

1427 

1185 

1245 

1436 

1533 

1393 

1574 

1407 

1265 

1374 

1361 


Nut 

nber oj 

^ convergence weight matrix for Hybrid Evolutionary Algorithm 

Sample 1 

Sample 2 

Sample 3 

Sample 4 

Sample 5 

The algorithm could not converge to the solution after alphabet ‘H’ so no convergence 

weight matrix obtained for remaining alphabets. 


Figare 4.12: The Comparison Chart for convergence weight matrices for recognition of 
handwritten English alphabets for third trial of simulation for network 14-5-5-5]. 
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with Soft Computing Approack 


Table 4.11(a): Results of fourth trial of first network [4-5-S-5J, 


Alphabets 

A 

B 

C 

D 

E 

F 

G 

H 

1 

J 

K 

L 


IteTQtions / error cctlculctted for Bcxck-Propcigcxtiou Algorithm 




Sample 1 

3816 

0.2 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 2 

1 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0,4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 3 

1 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 4 

1 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 5 

1 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Iterations / error calculated for Genetic Algorithm 

Sample 1 

2159 

492 

286 

1451 

4983 

169 

80 

5 

4301 

3146 

6 

8 

747 

Sample 2 

116 

■ 

2 

2 

1 

1 

■ 


2 

2 

2 

1 

1 

Sample 3 

1 

1 

649 

1 

1 

1 

1 

1 

2 

2 

2 

1 

1 

Sample 4 

2 

1 

1 

1 

6 

1 

54 

1 

11 

2 

IS 

1 

210 

Sample 5 

61 

1 

1 

1 

1 

1 

L 

2 

1 

1343 

10 

2 

1 

4 

Iterations / error calculated Hybrid Evolutionary Algorithm 

Sample 1 

46 

31 

163 

31 

25 

695 

10 

313 

1 

0.3 

0.3 

0.2 

0.2 

Sample 2 

— ^ — 

1 

1 

1 

1 

j 

1 

1 

1 

0.3 

0.3 

0,2 

0.2 

Sample 3 

1 

1 

1 

1 

1 

1 

1 

1 

1 

0.2 

0.3 

0.2 

0.2 

Sample 4 

1 

1 

1 

1 

I 

1 

1 

1 

I 

0.2 

0.3 

0.2 

0.2 

Sample 5 

1 

1 

1 

1 

.n,- ^ 

1 

1 

1 

1 

1 

0.2 

0.3 

0.2 

0.2 
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with Soft Computing Approach. 


Table 4.1 1(b): Results of fourth trial of first network ff-S-S-SJ. 


Alphabets 

N 

0 

p 

Q 

R 

S 

T 

U 

V 

w 

X 

Y 

Z 


Ite 

rations /Error calculated for Back-Propagation /. 

llgoriti 

m 



Sample 1 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 2 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 3 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 4 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 5 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Iterations /Error calculated for Genetic Algorithm 

Sample 1 

158 

672 

40 

2808 

2 

2 

3 

1 

354 

2 

18 

281 

2 

Sample 2 

1 

17 

1 

4 

69 

1 

1 

1 

■ 2 

39 

I 

1 

I 

Sample 3 

1 

1 

1 

2 

2 

■ 

■ 

■ 

■ 

2 

■ 

■ 

■ 

Sample 4 

1 

1 

59 

2 

18 

■ 

■ 

■ 

649 

2 

■ 

■ 

■ 

Sample 5 



1 

2 

2 

1 

■ 

■ 



■ 

■ 

im 

Iterations /Error calculated for Hybrid Evolutionary Algorithm 

Sample 1 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.1 

03 

0.1 

0.2 

0.3 

0.3 

0.2 

Sample 2 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.1 

0.2 

0.2 

0.2 

0.3 

0.3 

0.2 

Sample 3 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.2 

0.2 

0.2 

0.2 

0.2 

0.3 

0.2 

Sample 4 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.2 

0.2 

0.2 

0.2 

0.2 

0.3 

— 

0.2 

Sample 5 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.2 

0.2 

0.2 

0.3 

0.2 

0.3 

0.2 
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Figure 4.11: The Comparison Chart for iterations for recognition of handwritten 
English alphabets for fourth trial of simulation for network [4-5-5-5]. 



Table 4.12(a): Results of fourth trial of for first network [4-5-S-S]. 


alphabets 

A 

B 

C 

D 

E 

F 

G 

H 

1 

J 

K 

L 

M 


Number of convergence weight matrix for Genetic Algorithm 


Sample 1 

1277 

1206 

1476 

1361 

1480 

1484 

1371 

9 

1542 

1638 

1660 

10 

1299 

Sample 2 

1130 

1378 

1509 

1393 

1520 

1359 

199 

9 

1614 

1574 

1580 

12 

1483 

Sample 3 

1134 

1254 

124 

1441 

1570 

1501 

1284 

14 

1586 

1516 

1586 

69 

124i 

Sample 4 

1168 

1345 

25 

1401 

1590 

1353 

1668 

10 

103 

1552 

1729 

160 

1407 

Sample 5 

1136 

1230 

117 

1489 

1527 

1495 

1594 

8 

1484 

1614 

430 

14 

1486 


Nui 

mber o, 

f come 

rgence weight matrix for Hybrid Evolutionary Algorithm 

Sample 1 

119 

27 

1942 

1944 

1909 

97 

1748 

1745 

1 

The algorithm could not converge 

Sample 2 

1667 

1762 

1922 

1934 

1902 

1726 

1744 

1765 

1664 

to the solution after alphabet O’ 

Sample 3 

1632 

1709 

1941 

1928 

1890 

1764 

1750 

1769 

1689 

so no convergence weight matrix 

Sample 4 

1685 

1724 

1942 

1929 

1888 

1840 

1741 

1767“ 

1666 

obtained for remaining alphabets. 

Sample 5 

1627 

1708 

1922 

1929 

1875 

1824 

1755 

1774 

1664 
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Table 4.12(b): Results of fourth trial of for first network [4-S-5-S]. 


Mphabets N O 


Q pR 


T u V Tw Tx [y 


Number of convergence weight matrix for Genetic Algorithm 


Sample 1 

1146 

1494 

10 

1552 

3 

13 

1 

5 

151 

1493 

10 

1 1453 

Sample 2 

1100 

1032 

8 

145 

1195 

1566 

8 

9 

79 

1427 

11 

^ 1471 

Sample 3 

1184 

1005 

7 

1462 

1171 

1281 

6 

10 

1424 

1552 

17 

1433 

Sample 4 

1107 

1013 

1034 

1443 

1453 

1596 

10 

8 

10 

1547 

1267 

1504 

Sample 5 

1302 

1093 

107 

1424 

1293 

1216 

9 

i 11 

1481 

1508 

1331 

1461 


Number of convergence weight matrix for Hybrid Evolutionary Algorithm 
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Table 4.13(a): Results of fifth trial of first network [4-5-5-SJ. 


Alphabets 

A 

B 

C 

D 

E 

F 

G 

H 

I 

J 

K 

L 

M 

Iterations / error calculated for Back-Propagation Algorithm 

Sample 1 

183 

0.2 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 2 

1 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Samples 

1 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 4 

1 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 5 

1 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Iterations / error calculated for Genetic Algorithm 

Sample 1 

1096 

2114 

1953 

2539 

9194 

72 

1428 

1128 

2 

5 

28 

99 

2 

Sample 2 

6 

1 

233 

1 

2 

2 

125 

2 

7 

1 

41 

1 

7 

Sample 3 

57 

1 

82 

67 

3 

9 

28 

2 

2 

1 

1 

1 

1 

Sample 4 

78 

1 

2 

1 

2 

4 

2 

1 

1 

33 

1 

1 

I 

Sample 5 

2 

1 

2 

1 

7 

13 

' 

1 

1 

1 

1 

1 

1 

Iterations / error calculated Hybrid Evolutionary Algorithm 

Sample 1 

63 

42 

471 

87 

509 

14 

320 

6 

5110 

4 

3 

3 

20 

Sample 2 

2 

2 

91 

2 

2 

1 

2 

1 

1 

2 

2 

1 

1 

Sample 3 

2 

2 

1 

2 

14 

1 

2 

1 

1 

2 

2 

225 

1 

Sample 4 

2 

2 

2 

2 

2 

1 

2 

1 

1 

2 

1 

I 

1 

Sample 5 

2 

261 

2 

2 

2 

1 

2 

1 

1 

2 

2 

1 

1 
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"With Soft Computing Approach. 


Table 4.13(b): Results of fifth trial of first network [4-5-5-SJ. 


Alphabets 

N 

6 

P 

Q 

R 

s 

T 

U 

V 

w 

X 

Y 


L 

derations /Error calculated for Back-Propagation Al, 

gorithr 

n 


1 

Sample 1 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 2 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 3 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 4 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 5 

0.2 

0.1 

i 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 


Iterations /Error calculated for Genetic Algorithm 




■I 



m 

■ 

H 

■ 

m 

Hflj 
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H3 
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1120 

1505 

1651 

1623 

1553 

1181 

1350 

7 

1172 

1400 

184 

167 

63 

1526 

1594 

1654 

276 

1251 

45 

1263 

1373 

1371 

916 

94 

1143 

1385 

1477 

1518 

82 

1189 

1392 

1314 

1205 

94 
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Number of convergence weight matrix for Hybrid Evolutionary Algorithm 


Sample 1 

1887 

10 

1646 

1674 

1674 

1694 

1713 

■ 1 I 

1921 

10 

37 

Sample 2 

1891 

1831 

1697 

1698 

1696 

1698 

1731 

1864 

111 

1672 

1671 

Sample 3 

1806 

1853 

1621 

50 

1613 

1702 

1705 

1844 

1753 

73 

1640 

Sample 4 

1794 

1866 

1641 

48 

1610 

1708 

1728 

1875 

1709 

84 

2 

Sample 5 

1876 

1837 

131 

1641 

1628 

1674 

1676 

1856 

141 

1621 

1659 


1626 1492 


Number of convergence weight matrix for Genetic Algorithm 


1267 1358 


1667 1442 


Chapter 4: Analysis of Pattern Classification for Hand Written mth So ft Computing Approach. 

Figure 4.13: The Comparison Chart for iterations for recognition of handwritten 
English alphabets for fifth trial of simulation for network [4-5-5-5]. 
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Table 4.14(a): Results fifth trial of for first network [4-5-5-5J. 
















Chapter 4: Analysis of Pattern Classification for Hand Written 


with Soft Computing Approach. 


Table 4.14(b): Results of fifth trial of for first network [4-5-5-5]. 


Alphabets N 

0 

P 

Q 

R 

S 

T 

U 

V 

W 

X 

Y 

z 

Number oj convergence weight matrix for Genetic Algorithm 




Sample 1 

273 

1387 

1526 

1668 

1372 

1343 

1273 

1312 

1512 

1637 

1031 

1562 

1497 

Sample 2 

1516 

1575 

1507 

1653 

1353 

1199 

1253 

1363 

1524 

1287 

138 

1558 

1521 

Sample 3 

1491 

1705 

1389 

119 

1412 

1398 

1246 

1362 

1544 

1544 

63 

1575 

1484 

Sample 4 

1534 

1462 

1477 

1271 

1352 

1385 

1207 

1296 

1231 

1431 

1271 

38 

1197 

Sample 5 

1494 

1460 

1374 

1394 

1380 

91 

1159 

1322 

1417 

1430 

1137 

1324 

71 

Number of convergence weight matrix for Hybrid Evolutionary Algorithm 

Sample 1 

121 

1 

1 

The algorithm could not converge to the solution after alphabet "P’ so no 

convergence weight matrix obtained for remaining alphabets. 

Sample 2 

1692 

1696 

1674 

Sample 3 

1661 

1684 


Sample 4 

1686 

1590 


Sample 5 

1674 

46 



Figure 4.14: The Comparison chart for convergence weight matrices for recognition 
of handwritten English alphabets for fifth trial of simulation for network [4-5-5-5|. 
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Chapter 4: Analysis of Pattern Classification for Hand Written. 


with Soft Compuimg, Approack 


TublB 4^15(ci)m Results of first trial of second network f4’-6-6^5f 


Alphabets 

A 

B 

C 

D 

E 

F 

G 

H 

I 

J 

K 

L 

M 


Iti 

’.ration 

s / error calculated for Back-Propagation Al 

goritl 

m 



Sample 1 

4810 

0.3 

0.3 

0.4 

0.3 

0.3 

0.2 

103 

315 

0.3 

0.2 

0.3 

0.2 

Sample 2 

4 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

103 

315 

0.3 

0.2 

0.3 

0.2 

Sample 3 

4 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

103 

315 

0.3 

0.2 

0.3 

0.2 

Sample 4 

4 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

103 

315 

0.3 

0.2 

0.3 

02 

Sample 5 

4 

0.4 

0.3 

0.4 

0.3 

0.3 

0.2 

103 

315 

0.3 

0.2 

0.3 

0.2 

Iterations / error calculated for Genetic Algorithm 

Sample 1 

1556 

15 

52 

295 

25 

11 

17 

123 

37 

46 

8 

96 

12 

Sample 2 

22758 

8 

5 

71 

294 

40 

40 

36 

7 

77 

40 

56 

15 ! 

Sample 3 

10 

17 

13 

11 

83 

42 

19 

15 

206 

10 

22 

400 

83 

Sample 4 

30 

55 

932 

23 

20 

20 

22 

51 

1 

28 

err 

27 

75 

Sample 5 

5203 

706 

241 

31 

4363 

5 

39 

95 

241 

42 

err 

50 

err 

Iterations / error calculated Hybrid Evolutionary Algorithm 

Sample 1 

24 

2 

32 

3 

52 

89 

2 

164 

23 

1 

2 

4 

I 

Sample 2 

366 

38 

2 

2 

2 

2 

2 

1 

2 

1 

146 

2 

■ 2 

Sample 3 

1 

■ 


■ 

■ 

Ijn 


■ 




2 

15 



1 

■ 


■ 

2 

2 

I 

' 2 

1 

76 

2 

■1 
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Chapter 4: Analysis of Pattern Classification for Hand Written. 


with Soft Computing Approach. 


Table 4.15(b): Results of first trial of second network [4~6-6-5}. 


Alphabets 

N 

0 

p 

Q 

R 

S 

T 

U 

V 

w 

X 

Y 

z 

ItsTcitioHs /Ettov ccxlculcxtcd for Bcick-Propcigcitiofi a 

Igoriti 

hm 



Sample 1 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 2 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0,2 

Sample 3 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 4 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 5 

0.2 

0.1 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Iterations /Error calculated for Genetic Algorithm 

Sample 1 

51 

22 

43 

22 

. 

159 

37 

6562 

err 

239 

18 

168 

err 

err 

Sample 2 

191 

13 

99 

15 

53 

34 

70 

340 

err 

17 

err 

err 

144 

Sample 3 

7 

30 

44 

54 

48 

29 

184 

205 

12 

46 

15 

err 

29 

Sample 4 

11 

7 

483 

580 

17 

122 

err 

147 

err 

27 

err 

err 

888 

Sample 5 

17 

5 

26 

205 

256 

21 

err 

162 

err 

30 

17 

68 

err 

Iterations /Error calculated for Hybrid Evolutionary Algorithm 

Sample 1 

2 

1 

12 

2 

6 

20 

82 

52 

430 

2 

5 

1 

4 

Sample 2 

1 

1 

1 

2 

1 

2 

2 

j 

2 

2 : 

1 

1 

1 

Sample 3 

1 

1 

2 

2 

1 

2 

26 

1 

2 

2 

1 

1 

1 

Sample 4 

1 

1 

2 

1 

1 

2 

2 

1 

2 

1 

1 

1, 

1 

Sample 5 

1 

1 

2 

1 

1 

2 

2 

1 

2 

1 

1 

1 
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Chapter 4: Analysis of Pattern Classification for Hand Written.. with Soft Computing Approach. 


Figure 4.15: The Comparison Chart for iterations for recognition of handwritten 
English alphabets for first trial of simulation for network [4-6-6-5]. 



Table 4.16(a): Results of first trial of for second network 14-6-6-5}. 


Alphabets 

A 

B 

C 

D 

E 

F 

G 

H 

I 

J 

K 

L 

M 

Number of convergence weight matrix for Genetic Algorithm 

Sample 1 

1478 

1 

5 

12 

I 

1 

1 

4 

1 

1 

9 

0 

12 

Sample 2 

1 


1 

8 

1 

5 

1 

6 

1 

16 

3 

9 

2 

Sample 3 

1 

4 

1 

1 

2 


1 

1 

1 

9 

20 

3 

5 

Sample 4 

6 


2 

3 

1 

1 

3 

1 

1 

3 


14 

5 

Sample 5 

1 


1 

2 

1 

1 

1 

25 

1 

7 


2 


Nun 

iber of 

convergence weight matrix for Hybrid Evolutionary Algorithm 



Sample 1 

1762 

1788 

119 

1 

1919 

1868 

40 

1785 

1725 

0 

12 

13 

26 

Sample 2 

1836 

43 

1715 

1943 

1899 

1827 

1858 

1846 

1782 

22 

1831 

1559 

1706 

Sample 3 

1787 

1947 

1719 

1934 

1906 

1838 

1877 

1805 

1788 

1683 

1838 

1584 

. '1674 

Sample 4 

1754 

1939 

1722 

1938 

1917 

1843 

1831 

1833 

1774 

108 

16^6 

1585 

1705 

Sample 5 

1792 

1964 

1709 

1919 

1906 

1850 

1856 

1794 

1770 

1797 

1696 

1572 

1690 


Chapter 4: Analysis of Pattern Classification for Hand Written. 


with Soft Computing Approach. 


Table 4.16(b): Results first trial of for second network [4-6-6-5J. 


Alphabets 

N 

0 

P 

Q 

R 

S 

T 

U 

V 

W 

X 

Y 

z 

Mumber oj convergence weight matrix for Genetic Algorithm 


Sample 1 

1 

2 

1 

1 

1 

0 

3 

0.5 

1 

78 

1 



Sample 2 

1 

10 

1 

1 

7 

4 

63 

1 


3 



1 

Sample 3 

4 

1 

1 

1 

1 

1 

1 

1 

7 

1809 

34 


15 

Sample 4 

6 

24 

11 

3 

1 

1 


1 


5 



1 

Sample 5 

5 

1 

5 

2 

1 

1558 


2 


39 

1 

1 


Number of convergence weight matrix for Hybrid Evolutionary Algorithm 

Sample 1 

10 

21 

1818 

95 

5 

1835 

1746 

1794 

1688 

1680 

1 

10 

37 

Sample 2 

91 

1675 

1783 

1776 

80 

1851 

60 

1681 

1692 

1 

301 

1631 

1617 

Sample 3 

1662 

4 

1814 

1796 

1725 

1865 

1754 

1824 

1683 

1713 

11 

1573 

1566 

Sample 4 

1607 

1714 

1785 

108 

1693 

1816 

1824 

1679 

1705 

63 

1626 

23 

1613 

Sample 5 

1677 

1770 

1743 

134 

1729 

1829 

1819 

1806 

1684 

1690 

1624 

1567 

1554 


Figure 4.16: The Comparison Chart for convergence weight matrices for recognition 
of handwritten English alphabets for first trial of simulation for network [4-6-6-5J. 
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Chapter 4: Analysis of Pattern Classification for Hand Written mth Soft Computing Approach. 


Table 4.1 7(a): Results of second trial of second network [4-6-6-5J. 


Alphabets 

A 

B 

C 

D 

E 

F 

G 

H 

I 

J 

K 

L 

M 

Iterations / error calculated for 1 

3ack-Propagation Algorithm 




Sample 1 

25364 

0.4 

0.3 

0.3 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 2 

0.4 

0.4 

0.3 

0.3 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 3 

0.4 

0.4 

0.3 

0.3 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 4 

0.4 

0.4 

0.3 

0.3 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

03 

0,2 

Sample 5 

0.4 

0.4 

0.3 

0.3 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Iterations / error calculated for Genetic Algorithm 

Sample 1 

1707 

301 

4 

3604 

7254 

2 

7 

34116 

2100 

33335 

122 

986 

1 

Sample 2 

25 

1 

1 

2 

139 

2 

1 

44 

11 

2 

1 

2 

1 

Sample 3 

2 

1 

1 

2 

2 

1 

4 

1 

3 

2 

1 

72 

1 

Sample 4 

2 

2 

1 

4335 

253 

1771 

1 

1 

1 

2 

I 

4 

2 

Sample 5 

2 

16 

1 

2 

1 

1 

1 

34 

1 

2 

1 

699 

6 

Iterations / error calculated Hybrid Evolutionary Algorithm 

Sample 1 

94 

21 

1 

29 

1 

2 

1 

6 

6 

32 

1274 

99 

2 

Sample 2 

2 

2 

1 

• 3 

50 

1 

2 

f 

1 

1 

2 

1 

1 

Sample 3 

2 

2 

1 

1 

2 

1 

2 

1 

1 

1 

2 

1 

: 1 

Sample 4 

2 

2 

1 

1 

2 

1 

1 

1 

1 

1 

467 

38 

1 

Sample 5 

1 

32 

1 

1 

2 

1 

1 

1 

1 

1 

83 

2 

■ -1 
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Chapter 4: Analysis of Pattern Classification for Hand Written 


with Soft Computing Approach. 


Table 4,1 7(b): Results of second trial of second network [4~6-6-S]. 


Alphabets 

N 

O 

p 

Q 

R 

s 

T 

u 

V 

W 

X 

Y 

z 

Iterations /Error calculated for Back-Propagation 

Algorii 

<hm 



Sample 1 

0.2 

0.1 

315 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

3222 

0.2 

2684 

Sample 2 

0.2 

0.1 

315 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

3222 

0.2 

2684 

Sample 3 

0.2 

0.1 

315 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

3222' 

0.2 

2684 

Sample 4 

0.2 

0.1 

315 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

OJ 

3222 

0.2 

2684 

Sample 5 

0.2 

0.1 

315 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

3222 

0.2 

2684 

Iterations /Error calculated for Genetic Algorithm j 

Sample 1 

143 

28 

14 

1483 

304 

647 

500 

3 

1075 

■2 

err 

3744 

16839 

Sample 2 

2 

1 

2 

4 

1 

2 

1 

1 

2 

3 

22323 

4 

2 

Sample 3 

2 

1 

1 

1 

1 

1 

1 

J 

1 

2 

2 

2 

1 

2057 

Sample 4 

2 

34 

1 

1 

■ 

■ 

■ 

■ 

2 

2 




Sample 5 

2 

52 

1 

2 

1 

1 

1 

1 

9 

1 

23295 

2186 

1 

Iterations /Error calculated for Hybrid Evolutionary Algorithm 

Sample 1 

35 

41 

3 

55 

1 

6 

3 

3 

864 

60 

I 

. I 

187 

Sample 2 

139 

1 

1 

2 

2 

2 

1 

2 

1 

2 

38 

1 

2 

Sample 3 

1 

1 

1 

40 

2 

2 

1 

2 

1 

2 

1 

I 

2 

Sample 4 

1 

1 

1 

2 

2 

27 

1 

2 

1 

2 

1 

; 1 

3 

Sample 5 

1 

1 

1 

2 

2 

2 

i 

2 

1 

2 

I 

1 

1 








Chapter 4: Amlysis of Pattern aa„lftcaHo„f„ Hmd Written „,,h Soft Coreputirtg Approach. 


Figure 4.17: The Comparison Chart for iterations for recognition of handwritten 
English alphabets for second trial of simulation for network [4-6-6-5|. 



Table 4.18(a): Results of second trial of for second network [4-6-6~5J. 


Alphabets 

A 

B 

C 

D 

E 

F 

G 

H 

I 

j 

K 

L 

M 

Number of convergence weight matrix for Genetic Algorithm 

Sample 1 

100 

1384 

9 

1515 

1730 

1 

189 

1584 

1718 

24 

1574 ] 

151 1 

1 

Sample 2 1435 1417 16 1555 

1623 



514 


1580 
















Chapter 4: Analysis of Pattern Classification for Hand Written with Soft Computing Approach. 


Table 4.18(b): Results of second trial of for second network [4-6-6-5]. 


Alphabets 

N 

0 

P 

Q 

R 

S 

T 

U 

V 

W 

X 

Y 

Z 

Number oj convergence weight matrix for Genetic Algorithm 

Sample 1 

90 

1294 

10 

1390 

1382 

1561 

1595 

22 

1808 

1 

0.2 

1712 

325 

Sample 2 

1303 

1451 

7 

1411 

1649 

1671 

1592 

291 

1384 

69 

1362 

1652 

1504 

Sample 3 

148 

143 

6 

1293 

1553 

1684 

1624 

1598 

1374 

107 

66 

180 

1712 

Sample 4 

1324 

124 

12 

1171 

1621 

1697 

. 1529 

1536 

1478 

37 

1522 

297 

1793 

Sample 5 

1393 

1656 

5 

94 

1558 

1699 

1683 

1603 

1525 

179 

1699 

1502 

1778 

Number of convergence weight matrix for Hybrid Evolutionary Algorithm 

Sample 1 

1761 

1686 

2 

1892 

0 

1835 

1 

1 

1787 

40 

0 

no 

1841 

Sample 2 

1807 

1690 

1833 

1900 

1 

1861 

145 

1790 

1786 

1752 

1772 

53 

1843 

Sample 3 

97 

5 

1849 

1826 

1731 

1853 

1805 

1781 

1784 

1768 

1789 

1901 

1825 

Sample 4 

1837 

1659 

1823 

1846 

1788 

1879 

1807 

1804 

1772 

1739 

1780 

1801 

1805 

Sample 5 

1737 

1721 

1821 

1822 

27 

1863 

1844 

1781 

1835 

1756 

1812 

1892 

1807 


Figure 4.18: The Comparison Chart for convergence weight matrices for 
recognition of handwritten English alphabets for second trial of simulation for 
network [4-6-6-5]. 
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Chapter 4: Analysis of Pattern Classification for Hand Written with Soft Computing Approach. 


Table 4.19(a): Results of third trial of second network [4-6-6-5]. 


Alphabets 

A 

B 

c 

D 

E 

F 

G 

H 

I 

J 

K 

L 

M 

Iterations / error calculated for Back-Propagation Algorithm 

Sample 1 

32 

721 

482 

0.3 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 2 


15 

432 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 3 

3 

15 

1419 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 4 

3 

15 

31 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 5 

3 

15 

31 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Iterations / error calculated for Genetic Algorithm 

Sample 1 

1584 

9339 

186 

6 

1821 

1054 

67 

27 

106 

448 

2 

4632 

1854 

Sample 2 

1849 

404 

18 

24 

■ 

2 

2 

1 

2 

2 

2 

1134 

2 

Sample 3 

1 

20 

2 

1 


6 

38 

1 

313 

2 

2 

2 

2 

Sample 4 

1 

47 

6 

1 

4 

. 

680 

2 

1 

10 

2 

229 

2 

2 

Sample 5 

1 

2 

4 

1 

2 

10 

2 

1 

6 

2 

108 

1071 

819 

Iterations / error calculated Hybrid Evolutionary Algorithm 

Sample 1 

61 

3 

38 

473 

229 

3 

257 

4 

12 

32 

621 

3 

1 

Sample 2 

2 

1 

2 

1 

2 

2 

4 

28 

1 

2 

2 

1 

1 

Sample 3 

31 

1 

2 

1 

2 

1 

1 

2 

1 

2 

2 

1 

216 

Sample 4 

2 

1 

14 

■ 

2 

1 

1 

2 

1 

36 

2 

1 

2 

Sample 5 

2 

1 

1 

1 

2 

1 

1 

2 

1 

1 

16 

1 

34 
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Chapter 4: Analysis of Pattern Classification for Hand Written with Soft Computing Approach. 


Table 4.19(b): Results of third trial of second network [4-6-6-5]. 


Alphabets 

N 

0 

P 

Q 

R 

S 

T 

U 

V 

W 

X 

Y 

z 

Iterations /Error calculated for Back-Propagation Algorithm 

Sample 1 

0.2 

0.1 

0.4 

0.3 

0.3 

315 

0.3 

0.2 

0.2 

3222 

0.3 

0.2 


Sample 2 

0.2 

0.1 

0.4 

0.3 

0.3 

315 

0.3 

0.2 

0.2 

3222 

0.3 

0.2 


Sample 3 

0.2 

0.1 

0.4 

0.3 

0.3 

315 

0.3 

0.2 

0.2 

3222 

0.3 

0.2 


Sample 4 

0.2 

0.1 

0.4 

0.3 

0.3 

315 

0.3 

0.2 

0.2 

3222 

0.3 

0.2 


Sample 5 

0.2 

0.1 

0.4 

0.3 

0.3 

315 

0.3 

0.2 

0.2 

3222 

0.3 

0.2 


Iterations /Error calculated for Genetic y 

ilgorithm 

Sample 1 

1 

92 

163 

711 

2310 

147 

37520 

565 

459 

68 

6238 

1094 

5< 

Sample 2 

2 

2 

6 

2 

2 

1 

1 

126 

2 

2 

10 

1 

4 ; 

Sample 3 

72 

2 

279 

2 

6 

1 

1 

2 

1 

2 

2 

1 


Sample 4 

2 

2 

11 

112 

4 

1 

1 

2 

1 

289 

102 

1 


Sample 5 

10 

2 

1 

708 

802 

10 

1 

1 

1 

2 

2 

2 


Iterations /Error calculated for Hybrid Evolutionary Algorithm 

Sample 1 

198 

6 

40 

5 

22 

1 

203 

1 

26 

93 

3 

1 


Sample 2 

2 

1 

2 

1 

1 

1 

1 

1 

1 

207 

1 

2 


Sample 3 

42 

1 

39 

1 

1 

1 

1 

1 

1 

2 

1 

2 


Sample 4 

1 

1 

5 

1 

1 

2 

1 

1 

1 

2 

1 

6 


Sample 5 

r 

1 

1 

1 

1 

1 

1 

1 

1 

2 

1 

1 

1 
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Chapter 4: Analysis of Pattern Classification for Hand Written with Soft Computing Approach. 


Figure 4.19: The Comparison chart for iterations for recognition of handwritten 
English alphabets for third trial of simulation for network [4-6-6-5]. 



Table 4.20(a): Results of third trial of for second network [4-6-6-5J. 


Alphabets 

A 

B 

c 

D 

E 

F 

G 

H 

I 

J 

K 

L 

M : 

Number of convergence weight matrix for Genetic Algorithm 

Sample 1 

1205 

1602 

1348 

11 

1449 

11 

1728 

4 

153 

1639 

5 

1759 

14 

Sample 2 

1463 

1565 

61 

1226 

1547 

1555 

1727 

42 

1277 

1623 

1709 

1737 

15 

Sample 3 

206 

1509 

1499 

1170 

1434 

1725 

28 

206 

113 

1571 

1684 

1711 

14 

Sample 4 

1466 

1609 

1573 

1371 

1354 

1581 

1687 

1270 

1434 

1652 

1549 

1745 

14 

Sample 5 

23 

1484 

1567 

1145 

1377 

1721 

1690 

1420 

1343 

1637 

1695 

1649 

1”? 

Number of convergence weight matrix for Hybrid Evolutionary Algorithm 

Sample 1 

62 

2 

136 

1923 

69 

13 

50 

1788 

1810 

1819 

1786 

2 

1( 

Sample 2 

1508 

1717 

1832 

1930 

1755 

1808 

1750 

1816 

1784 

1727 

1857 

1723 

■■ :'i|' 

Sample 3 

1661 

1719 

1800 

1919 

1774 

4 

1709 

1806 

1842 

1764 

1878 

1790 


Sample 4 

1669 

1690 

1639 

1928 

1744 

1830 

1802 

1768 

1802 

1755 

1869 

1730 

18 

Sample 5 

1780 

1709 

1711 

1924 

1766 

1850 

1762 

1713 

1844 

1771 

1915 

1797 

18 
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Chapter 4: Analysis of P attern Classification for Hand Written with Soft Computing Approach. 


Table 4.20(b): Results of third trial of for second network [4-6-6-5]. 


Alphabets 

N 

0 

P 

Q 

R 

S 

T 

U 

V 

W 

X 

Y 

Z 

Number of convergence weight matrix for Genetic Algorithm 

Sample 1 

1 

199 

1488 

1366 

1523 

1519 

1580 

1521 

1638 

1539 

155 

1623 

1706 

Sample 2 

8 

33 

1411 

1371 

1408 

1626 

1687 

1647 

1777 

1642 

1596 

1588 

1527 

Sample 3 

1451 

1654 

1222 

1343 

1397 

1608 

1619 

1614 

1721 

1623 

1661 

1653 

1516 

Sample 4 

1441 

1591 

1219 

1298 

1485 

1557 

47 

1516 

1786 

88 

1589 

1553 

1552 

Sample 5 

1609 

1635 

1213 

1623 

1521 

1565 

1488 

1460 

1719 

_ .J 

1456 

1626 

1 

1602 

1457: 

Number of convergence weight matrix for Hybrid Evolutionary Algorithm 

Sample 1 

87 

2 

1869 

328 

126 

86 

1874 

29 

1833 

1801 

21 


1780 

Sample 2 

1787 

1853 

1857 

1852 

1834 

1876 

44 

58 

1877 

1786 

41 

1675 

1651 

Sample 3 

1782 

1812 

1822 

1889 

1840 

1882 

64 

1739 

1817 

1754 

1746 

1812 

1763; 

Sample 4 

47 

1824 

1809 

1910 

1830 

1904 

85 

1712 

1844 

1754 

1741 

1878 

1705i 

Sample 5 

1711 

1829 

1824 

1904 

1826 

1900 

1778 

1718 

1821 

1749 

1847 

1903 

1753: 


Figure 4.20: The Comparison Chart for convergence weight matrices for 
recognition of handwritten English alphabets for third trial of simulation for 
network [4-6-6-5]. 
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Chapter 4. Analysis of Pattern Classification for Hand Written with Soft Computing Approach. 


Table 4.21(a): Results of fourth trial of second network [4-6-6-5J. 


Alphabets 

A 

B 

c 

D 

E 

F 

G 

H 

i 

J 

K 

L 

M 

Iterations / error calculated for Back-Propagation Algorithm 

Sample 1 

32 

75 

2629 

5953 

0.3 

103 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 2 

3 

3 

17 

3 

0.3 

103 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 3 

3 

3 

17 

3 

0.3 

103 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 4 

3 

3 

17 

3 

0.3 

103 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 5 

3 

3 

17 


0.3 

103 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Iterations / error calculated for Genetic Algorithm 

Sample 1 

2038 

175 

1471 

535 

11256 

1965 

868 

282 

4264 

J 

175 

7 

89 

Sample 2 

20 

2 

148 

2 

2 

1 

4 

2 

4 

3 

2 

135 

17 

Sample 3 

2 

7 

9 

1011 

2 

m 

■1 

198 

2 

4 

2 

3 

2 

Sample 4 

2 

2 

1 

1 

2 

3176 

1 

2 

2 

1 

359 

461 

2 

Sample 5 

2 

15 

2 

1 

2 

21 

1 

2 

74 

2 

4 

2 

54 

Iterations / error calculated Hybrid Evolutionary Algorithm 

Sample 1 

35 

33 

846 

51 

173 

42 

2 

56 

70 

2 

2 

462 

1 

Sample 2 

2 

2 

1 

2 

2 

2 

2 

1 

3 

2 

1 

2 

2 

Sample 3 

2 

2 

1 

2 

2 

192 

2 

1 

2 

1 

1 

2 

2 

Sampled 

4 

2 

1 

198 

2 

1 

2 

1 

2 

1 

2 

2 

2 

Sample 5 

2 

2 

1 

2 

2 

2 

2 

1 

2 

1 

8 

2 

2 
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Chapter 4: Analysis of Pattern Classification for Hand Written with Soft Computing Approach. 


Table 4.21(b): Results of fourth trial of second network [4-6-6-5]. 


Alphabets 

N 

o 

P 

Q 

R 

S 

f 

u 

V 

w 

X 

Y 

z 

J 

derations /Error calculated for Back-Propagation Algorithm 

Sample 1 

0.2 

315 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 2 

0.2 

315 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 3 

0.2 

315 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 4 

0.2 

315 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Sample 5 

0.2 

315 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

0.2 

0.1 

0.3 

0.2 

0.2 

Iterations /Error calculated for Genetic Algorithm 

Sample 1 

606 

42 

25 

2 

5886 

1197 

19588 

6099 

23641 

90 

247 

490 

5001 

Sample 2 

1 

1 

1 

3 

1 

75 

875 

2 

2 

2 

1 

1 

2 

Sample 3 

28 

1 

1 

96 

1 

2 

482 

2 

7 

2 

1 

1 

462 

Sample 4 

2 

1 

1 

28 

1 

359 

6 

3669 

2 

2 

1 

1678 

2 

Sample 5 

2 

1 

1 

2 

1 

8 

2 

12 

1 

2 

1 

1 

2 

Iterations /Error calculated for Hybrid Evolutionary A 

ilgorithm 

Sample 1 

511 

73 

24 

1 

1 

1 

3 

1 

38 

1 

13 

121 

57 

Sample 2 

760 

2 

40 

1 

1 

1 

2 

1 

1 

1 

2 

1 

1 

Sample 3 

2 

2 

1 

1 

1 

1 

2 

1 

1 

2 

2 

1 

1 

Sample 4 

1 

3 

1 

1 

1 

1 

2 

1 

1 

2 

2 

1 

1 

Sample 5 

1 

2 

1 

1 

1 

1 

2 

1 

1 

1 

2 

1 

1 
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Chapter 4: Analysis of Pattern Classification for Hand Written... .... M/ith Soft Computing Approach. 


Figure 4.21: The comparison chart for iterations for recognition of handwritten 
English alphabets for fourth trial of simulation for network [4-6-6-5]. 



Table 4.22(a): Results of fourth trial of for second network [4-6-6-5]. 


Alphabets 

A 

B 

c 

D 

E 

F 

G 

H 

I 

J 

K 

L 

M 

Number of convergence weight matrix for Genetic Algorithm 

Sample 1 

1409 

1387 

1318 

231 

1796 

1492 

1230 

36 

1624 

1 

1676 

9 

1172 

Sample 2 

113 

1417 

1348 

112 

1780 

1267 

1338 

68 

1580 

11 

1723 

340 

121 : 

Samples 

1185 

1395 

1318 

1282 

1771 

1433 

1323 

1403 

1685 

1602 

1716 

1491 

1558 : 

Sample 4 

1176 

1392 

■ 

51 

1236 

1767 

1468 

30 

1395 

1591 

40 

1229 

1454 

1621 

Sample 5 

1216 

140 

1313 

1397 

1721 

1574 

1391 

1431 

1513 

1473 

1598 

1389 

17 ; 

Number ofconvergence weight matrix for Hybrid Evolutionary Algorithm 

Sample 1 

1 

1 

1789 

1868 

1907 

1787 

■ '■ 

72 

[1891 

1781 

8 

1 

1901 

30 ; 

Samples 

5 

1844 

1799 

1859 

1916 

1889 

1829 

1899 

1808 

1811 

1872 

1835 

1763 i 

Sample 3 ! 

1 

1785 

1806 

1870 

1918 

1843 

1827 

899 

1826 

1804 

1802 

1774 

1682 

Sample 4 ' 

14 

1760 

1816 

1919 

1899 

1838 

1839 

1899 

1803 

1869 

1815 

1757 

1747 

Sample 5 1 

13 

1 1784 

i'' 

1' 

1722 

1807 

1931 

1849 

1847 

1923 

1835 

1800 

1808 

1787 

1691 
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Chapter 4: Analysis of Pattern Classification for Hand Written with Soft Computing Approach. 


Table 4.22(b): Results of fourth trial of for second network [4-6-6-5]. 


Alphabets 

N 

0 

P 

Q 

R 

S 

T 

U 

V 

W 

X 

Y 

Z 

Number of convergence weight matrix for Genetic Algorithm 

Sample 1 

1129 

1566 

12 

3 

11 

1666 

53 

108 

1559 

20 

1388 

1629 

1678 

Sample 2 

100 

80 

24 

1521 

1530 

1631 

1660 

183 

1707 

1774 

1519 

1557 

1676 

Sample 3 

1509 

1543 

45 



1385 

1469 

1498 

128 

212 

154 

1775 

1580 

1709 

136 

Sample 4 

1284 

225 

21 

1564 

1561 

140 

1512 

1735 

1298 

1783 

1465 

234 

1485 

Sample 5 

1370 

1554 

1333 

1559 

67 

1759 

1517 

1602 

1699 

1802 

1538 

152 

1506 

Number of convergence weight matrix for Hybrid Evolutionary Algorithm 

Sample 1 

1766 

1732 

1937 

4 


64 

6 

116 

1822 

72 

1766 

1801 

1804 

Sample 2 

1777 

1634 

1888 

113 

1815 

41 

1605 

104 

1792 

79 

1762 

1816 

1854 

Sample 3 

1751 

1649 

1927 

1759 

1848 

1817 

1648 

123 

1811 

1699 

1767 

1876 

1770 

Sample 4 

1682 

1831 

1926 

1777 

1788 

1743 

1635 

1698 

1791 

1685 

1756 

1895 

1809 

Sample 5 

1729 

1822 

1937 

1779 

1829 

1792 

1638 

1734 

1820 

1709 

1782 

1857 

1812 


Figure 4.22: The Comparison Chart for convergence weight matrices for recognition 
of handwritten English alphabets for fourth trial of simulation for network [4-6-6-5]. 






Chapter 4: Analysis of Pattern Classification for Hand Written with Soft Computing Approach. 


Table 4.23(a): Results of fifth trial of second network [4-6-6-5J. 


Alphabets 

A 

B 

C 

D 

E 

F 

G 

H 

I 

J 

K 

L 

M 

Iterations / error calculated for Back-Propagation Algorithm 

Sample 1 

94 

1459 

222 

0.2 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 2 

31 

23 

17 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 3 

31 

23 

17 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 4 

31 

17 

17 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Sample 5 

155 

17 

17 

0.4 

0.3 

0.3 

0.2 

0.4 

0.3 

0.3 

0.2 

0.3 

0.2 

Iterations / error calculated for Genetic Algorithm 

Sample 1 

1956 

166 

11940 

30 

54 

5419 

2 

19941 

2203 

101 

223 

358 

412 

Sample 2 

24 

3717 

1886 

2 

1 

1 

2 

7070 

2 

1 

2 

7 

688 

Sample 3 

2 

11 

15 

2 

1 

1 

91 

4 

2 

1 

8 

1 

2 

Sample 4 

15 

208 

2 

2 

2 

1 

437 

2 

2 

1 

2 

1 

140 

Sample 5 

2 

2 

2 

2 

3 

1 

4 

4 

2 

1 

2 

1 

1 

Iterations / error calculated Hybrid Evolutionary Algorithm 

Sample 1 

82 

97 

1207 

r2 

1 

1 

1 

53 

3 

9 

378 

2 

279 

Sample 2 

2 

2 

1 

1 

1 

1 

1 

2 

1 

1 

1 

1 

1 

Sample 3 

2 

2 

1 

1 

1 

1 

1 

2 


2 

1 

1 

1 

Sample 4 

2 

210 

1 

1 

1 

1 

■ 

2 

1 

2 

1 

1 

1 

Sample 5 

2 

2 

1 

1 

1 

1 

■ 


604 


mu 

■ 

■u 





chapter 4: Analysis of Pattern Classification for Hand Written with Soft Computing Approach. 


Table 4.23(b): Results of fifth trial of second network [4-6-6-5]. 


Alphabets 

N 

o 

p 

Q 

R 

s 

T 

U 

V 

W 

X 

Y 

Z 

Iterations /Error calculated for Back-Propagation Algorithm 



nm 

252 






0.2 

0.1 

103 

0.2 

0.2 



m 

mmi 






0.2 

0.1 

103 

0.2 

0.2 



mm 

im 

^^m 


^m 



0.2 

0.1 

103 

0.2 

0.2 



mm 


^m 


mim 



0.2 

0.1 

103 

0.2 

0.2 


mm 








0.2 

0.1 

103 

0.2 

0.2 

Iterations /Error calculated for Genetic Algorithm 

Sample 1 

mm 






45 

3942 

724 

1124 

3831 

8 

1117 


■ 



■ 

■ 



■ 

1 

2 

1 

1 

1 


■ 

■ 

■ 

■ 

2 

2 

1 

■ 

1 

8 

2 

1 

1 

Sample 4 

■ 

■ 


■ 

48 

7 

64 

1 

1 

2 

22682 

1 

1 




■ 



2 

2 

1 

1 

2 

662 

2 

1 

Iterations /Error calculated for Hybrid Evolutionary Algorithm 

HU 

■ 

■ 




507 

17 

3 

3 

81 

218 

145 

25 


■ 

■ 

■ 

■ 

■1 

1 

2 

2 

1 

1 

1 

1 

2 


■ 

1 


■ 

■ 

1 

30 

12 

1 

1 

1 

1 

21 

Sample 4 

1 

1 

1 

1 

1 

1 

26 

1 

1 

1 

1 

1 

2 

Sample 5 

1 

1 

1 

1 

i 

1 

1 

1 

1 

1 

1 

1 

2 
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Chapter 4: Analysis of t'attern CLassiJication for Hand Written with Soft Computing Approach. 


Figure 4.23: The comparison chart for iterations for recognition of handwritten 
English alphabets for fifth trial of simulation for network [4-6-6-5]. 



Table 4.24(a): Results of fifth trial of for second network [4-6-6-5]. 


labets 

A 

B 

C 

D 

E 

F 

G 

H 

I 

J 

K 

L 

M 

Number of convergence weight matrix for Genetic Algorithm 

ample 1 

1004 

1445 

1531 

1540 

1395 

1537 

4 

3 

1725 

21 

1523 

1701 

1489 

ample 2 

1165 

37 

1581 

1382 

1364 

1499 

207 

1674 

1709 

1524 

1589 

1691 

1452 

ample 3 

1037 

1486 

1560 

1368 

1321 

67 

45 

1544 

1729 

1582 

1479 

62 

1500 

ample 4 

1226 

1282 

1526 

1525 

1213 

1452 

1491 

1536 

1701 

1516 

1679 

1684 

1372 

ample 5 

1339 

1354 

1535 

1614 

158 

38 

1645 

1462 

1579 

1615 

1615 

1443 

1531 

Number of convergence weight matrix for Hybrid Evolutionary Algorithm 

ample 1 

1731 

1964 

1735 

1 

11 

\ ■ : 

10 

1864 

1684 

1808 

1649 

16 

1679 

ample 2 

1760 

1957 

1698 

1894 

48 

1919 

1732 

1928 

1666 

1945 

1639 

1732 

1786 

ample 3 

1735 

1966 

1767 

1911 

1694 

1911 

1719 

1852 

1650 

1943 

1680 

1819 

1727 

ample 4 

1742 

1923 

1731 

1855 

1697 

1917 

1705 

1809 

1662 

1935 

28 

1729 

1718 

ample 5 

1763 

1861 

1688 

1887 

1680 

1891 

1760 

1878 

1741 

1885 

1753 

1811 

1698 


214 






Chapter 4: Analysis oj rauern L.tassijicaiion jor tiana wriiien 


Wliri ouji K^umyuiin^ uwoa *. 


Table 4.24(b): Results fifth trial of for second network [4-6-6-5]. 


Alphabets 

N 

O 

P 

Q 

R 

S 

T 

U 

V 

W 

X 

Y 

Z 

Number of convergence weight matrix for Genetic Algorithm 

Sample 1 1 

107 

1671 

398 

141 

1652 

62 

1382 

67 

1119 

1618 

62 

1516 

1091 

Sample 2 

112 

! 

1684 

1435 

114 

1647 

208 

1469 

1485 

1630 

168 

1721 

46 

1373 

Sample 3 

128 

1663 

1520 

1579 

1598 

1627 

1285 

1172 

78 

1556 

1690 

1591 

1332 

Sample 4 

1467 

1695 

1377 

1795 

1631 

1681 

1666 

1473 

1699 

1655 

1504 

1593 

204 

Sample 5 

151 

1722 

1 

26 

1803 

1624 

1 

1635 

1485 

1388 

i 

1275 

^ 1 

1631 

i 

1678 

j 

1538 

1328 

Number of convergence weight matrix for Hybrid Evolutionary Algorithm 

Sample 1 

1 

30 

1755 

1689 

i 

1774 

1659 

90 

160 

1 

1788 

1757 

1734 

1 1835 

Sample 2 

1764 

1690 

1817 

1783 

1743 

1670 

1799 

1711 

116 

1828 
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Figure 4.24: The Comparison Chart for convergence weight matrices for recognition 
of handwritten English alphabets for fifth trial of simulation for network 14-6-6-5]. 
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Table 4.25: Average iterations of all alphabets for each trial of second network 


Average Iterations Trial 1 Trial 2 Trial 3 Inal 4 inaij 
GA 6931 1688 687 767 782 

Hyb_EA 14 30 24 31 34 

Figure 4.25: Comparison chart for Average iterations for all alphabets for each trial 
of second network [4-6-6-5J. 




Table 4.26: Average number of convergence weight matrices of all alphabets for each 
trial of second network f4-6-6-5J. 

Average Counts Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 


Figure 4.25: Comparison chart for Average number of convergence weight matrices 
for all alphabets for each trial of second network [4-6-6-5]. 
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4.6 Conclusion 

Results of the experiments clearly show that hybrid evolutionary algorithm takes less 
number of iterations in comparison to simple genetic algorithm, though it is found that 
the back-propagation algorithm does not conclude the results for this problem up to the 
50000 iterations, however it suffers with the local minima of unknown error surface for 
this problem. It has also been found that the rate of convergence of weight matrix for this 
problem is much higher for hybrid evolutionar algorithm in comparison to random 
genetic algorithm for the network with six neuron in two hidden layers [4-6-6-5] as 
shown in tables 4.16 [a, b], 4.18 [a, b], 4.20 [a, b], 4.22[a, b], 4.24[a, b] . However the 
random genetic algorithm surprisingly perform better in case of trials of network with 
five neurons in two hidden layers[4-5-5-5] in comparison to hybrid evolutionary 
algorithm, which could not converge after the recognition of 4-5 alphabets as shown in 
figure 4.6 [a, b], 4.8 [a, b], 4.10 [a, b], 4.12 [a, b], 4.14 [a, b]. These values show that 
many numbers of convergence weight matrices have been obtained for the particular 
character by applying the genetic algorithm and hybrid genetic algorithm. So on the basis 
of the above mentioned results the following conclusion can be made: 

1. The results clearly indicate that, for the classification of handwritten English 
alphabets recognition problem, feed-forward neural network trained with back- 
propagation algorithm does not perform better in comparison to feed- forward neural 
network trained with genetic algorithm. The performance of genetic algorithm and hybrid 
genetic algorithm are found better and accurate in all the simulations. 

2. It is found that, for the network [ 4 - 6 - 6 - 5 ], the hybrid evolutionary feed-forward 
neural networks gives more than one convergent weight matrices for every input pattern 
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in comparison to the back-propagation feed-forward neural network and random genetic 
algorithm. This shows the higher accuracy rate in the pattern recognition with hybrid 
evolutionary feed-forward neural network. 

3. The higher number of convergence weight matrices in the hybrid EA training process 
suggests that this algorithm may not be trapped in the false minima of the error surface. 
But it is also found surprisingly that the first network consisting of five neurons in two 
hidden layers is also being trapped with the local minima problem in case of hybrid 
evolutionary algorithm. It may also increase the possibilities of misclassification for any 
unknown testing input pattern. 

4. The performance of hybrid evolutionary algorithm is found better in comparison to 
the genetic algorithm by finding the less number of iterations as shown in the table 4.25 
& Figure 4.25 and higher rate of convergence for character recognition in comparison to 
the results for simple genetic algorithm as shown in the table 4.26 & Figure 4.26. 

5. The network consist two hidden layers with five neurons is trapped in local minima of 
previously learned pattern’s error surface in case of hybrid evolutionary algorithm. It is 
observed that after the recognition of 6-7 character the network is fail to converge for the 
remaining alphabet samples. It means the network is suffering fi-om the tendency to 
occupy the local minima of error surface for the previously trained pattern. It is not in the 
case of second network consisting six neurons in two hidden layers [4-6-6-5]. The second 
network is converged for almost every sample of alphabets. So it may also be concluded 
that the network consisting six neurons in two hidden layers are provide the optimum 
network. 
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Chapter 5: Analysis for the Recognition of Scaled and Rotated Hand Written English 
Alphabets with Online Learning Implementation Using Soft-Computing Techniques. 

Abstracts 

This chapter describes the performance of online learning capabilities of neural networks 
with the genetic algorithm and hybrid evolutionary algorithms for hand written English 
alphabets after their scaling and rotation. The random genetic algorithm and hybrid 
evolutionary algorithm are used for training to recogiiize the scaled and rotated alphabets. 
Genetic algorithms for the hybrid neural network are showing the numerous potential in 
the field of pattern recognition. We have taken two samples of each hand written English 
alphabets, one trained sample and another one is fresh or unknown sample, after applying 
the general mathematical algorithm for rotation and scaling of the sample on both X and 
Y axis. The random genetic algorithm and the hybrid evolutionary algorithm are applied 
to train these samples. Network trained for straight alphabet samples have been used to 
recognize the rotated and scaled sample from the set of already trained five straight 
alphabet’s sample. The online learning algorithm with the combination of evolutionary 
algorithm has been taken the definite lead on the conventional approaches of neural 
network and soft computing techniques. The results of the experiments clearly show that 
the hybrid approach is efficient to train the hand written samples of any shape and size 
most reliably. 
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5.1 Introduction 

For more than thirty years, researchers have been working on handwriting recognition. 
This new stage in the evolution of handwriting processing results from a combination of 
several elements: improvements in recognition rates, the use of complex systems 
integrating several kinds of information, the choice of relevant application domains, and 
new technologies such as high quality & high speed scanners and inexpensive powerful 
processors[l]. Methods and recognition rates depend on the level of constraints on 
handwriting. The constraints are mainly characterized by the types of handwriting, the 
number of scripter, the size of the vocabulary and the spatial layout. Recognition 
strategies heavily depend on the nature of the character set to be recognized. Recognition 
becomes more difficult when the constraints decreases. Intense activity was devoted to 
the character recognition problem during the seventies and the eighties and pretty good 
results have been achieved [2]. Character recognition techniques can be classified 
according to two criteria: the way preprocessing is performed on the data and the type of 
the decision algorithm. There have been reports of successful use of neural networks for 
the recognition of handwritten characters [3-5], but it is very difficult to find any general 
investigation which might shed light on the systematic approach of a complete neural 
network system for the automatic recognition of cursive character. 

Traditionally, the recognition of hand-written characters has been considered of 
importance for various applications in which the automatic reading of hand written text is 
needed. More recently, the recognition of hand-written characters has gained importance 
to ikt intelligent systern. 
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The hand written characters may have the different shapes and different size .It may be 
straight, scaled and / or rotated. The human can easily recognize the character of any 
shape and size. But for the machine every alphabet after scaling or/ and rotation becomes 
new pattern. So the recognition of the hand written character is very difficult-indeed, the 
symbols can be of any size and orientation in the image frame. The recognition rate for 
rotated and / or scaled hand written characters cannot be as good as the straight 
characters. 

The recognition rate depends on the number of writers and their training. If the number of 
writers is more, recognition can be more difficult. There are several techniques to 
recognize the hand written character which are in the straight form but a very few work 
have been done for scaled and rotated hand written alphabets [6-10]. A feature based 
approach is also suggested that are not sensitive to rotations, translations, and scaling, and 
are resistant to noise and distortions as well. Some works have been done on Arabic 
cursive character [11, 12]. Nawwaf & Rabab [11] also proposed a simple and 
computationally efficient mapping, which can be used for character recognition by taking 
the application of hand- written Arabic characters, and explained that it (taken with other 
features of the character) can be used to produce an effective recognition system to 
identify each character uniquely. The recognition process for a large set of complex 
problems such as hand written characters after scaling and rotation mostly depends on the 
way of training. Efficient learning of a pattern largely depends upon the training methods. 
The process of learning of the pattern may be divided into two basic principals ‘on-line 
learning’ and ‘off-line’ learning. In an off-line learning the given patterns are used 
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together to determine the weights. On the other hand in an on-line learning the 
information in each new pattern is incorporated into the network by incrementally 
adjusting the weights [13]. Thus, an on-line learning allows the network to update the 
information continuously. The online learning is one of the most powerful and commonly 
used techniques for training large layered networks and large training set, and has been 
used successfully in many real-world applications. The powerful combination of 
analytical methods provides more insight and deeper understanding of existing 
algorithms, and leads to novel and principled proposals for their improvement. The 
results of the handwritten English straight alphabets recognition problem clearly shows 
that, [14] feed-forward neural network trained with back propagation algorithm does not 
perform better in comparison to feed forward neural network trained with genetic 
algorithm . The performance of genetic algorithm and hybrid evolutionary algorithm are 
found better and accurate in all the simulations. The higher number of convergence 
weight matrices in the hybrid EA training process suggests that this algorithm may not be 
trapped in the false minima of the error surface [15]. The performance of hybrid 
evolutionary algorithm has been found better in comparison to the genetic algorithm by 
finding the less number of iterations and higher rate of convergence for character 
recognition, on comparing the performance of two different networks (first "with five 
neurons in two hidden layers and the second with six neurons in two hidden layers) it has 
been found that the network with six neurons in two hidden layers provide the better 
results. So it is expected that the network consisting with six neurons in two hidden layers 
will provide the optimum network [15]. 
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In this chapter the trained neural network for straight hand written alphabets are used for 
training and recognition of scaled and / or rotated alphabets. The scaled and/ or rotated 
samples used for training have been taken from the trained set of straight alphabets while 
the test patterns has been considered from the set of test pattern of straight alphabets and 
again these patterns have been scaled and rotated to consider them as the new training 
pattern set for refinement . Then, the new samples of each alphabet have been used as test 
patterns for this problem. The training is done by genetic algorithm and hybrid 
evolutionary algorithm for five different trials of trained network. It has been observed 
that the performance of online training by hybrid evolutionary algorithm [ 16 ] is found 
better by getting less number of iterations and more number of convergence weight 
matrices in comparison to random genetic algorithm. The rate of recognition of scaled 
and / or rotated hand written English alphabets is also found very high. 

Second section of this chapter deals with the basic mathematics of image scaling and 
rotations. Section three defines the present status of the trained network, while the fourth 
section defines the design of network and implementation details. Section five shows the 
results of the experiments. In section six discussions have been made and the section 
seven shows the future works. 

5.2 Scaling and Rotation 

The alphabets used as pattern, first scanned and then converted to bitmap image so the 
patterns of handwritten English alphabets will be consider as the images. 
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In geometry and linear algebra, the scaling is defined [17, 18] as mathematics to resize 
the image onto the X and Y axis (for two dimensional objects). The more obvious and 
common way to change the size of an image is scaling the image. The content of the 
image is enlarged by increasing the size of the pixel values of the image. But while the 
actual image pixels and colors are modified, the content represented by the image is 
essentially left unchanged. Scaling of an image make a drastic change to the content of 
the image. An image can be represented as pixel matrix and a scaling can be represented 
by a scaling matrix. While the rotation [17, 19] is a transformation in a plane that 
describes the change in the orientation of an image. A rotation is different fi-om a 
translation, which has no fixed points, and from a reflection, which "flips" the image it is 
transforming. This overall process of change the shape, position and orientation of an 
image is known as the transformation of image. When an image is moved to stationary 
co-ordinate system or background, referred as geometric transformation and applied to 
each point of an image. And while the co-ordinate system is moved relative to the image 
and image is held stationary then this process is termed as a co-ordinate transformation. 
In the process of transformation every image is assumed as a set of points. Every point P 
of the image has co-ordinate (x,:);) and the complete image is defined as sum of total of 
all co-ordinates points [20]. 

Image rotation is a transformation where the image is rotated (9° about origin [21]. The 
co-ordinate of new point is given by the following equation, F= Rg{P) 
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Let US consider that the co-ordinate of a point P is {x,y) and after rotation the coordinate 
of point P’ is , y ) . If initially P is at the a angle from x-axis then 



Figure 5J: The diagrammatical representation of rotation. 


x'=r cos(a + 0) 

= r cos a cos 0 -rsma sin d 

since x = rcosa and y = r sin a 


therefore x'- xcosO-ysinO ( 5 - 1 ) 

and y=rsmacos9-{-rsin9cosa 
j;'= :)csin ^ + >’COS(9 


So, the rotation matrix can be defined asi?^ = 


COS <9 -sin^ 
sin 9 COS 9 


, by multiplying this to the 


original image we can get a rotated image. 
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While the scaling is the process of changing the size and proportion of the image [18]. 
There are two factors used in scaling transformation i.e. S^ and Sy , where S^ is a scale 

factor for the x co-ordinate and Sy is scale factor of y co-ordinate.Let us assume that the 
co-ordinate of a point P is {x,y) and after scaling the coordinate of point P’ is(x’,;;') then, 

P' =Ss^^Sy{P) (5-2) 


x'=Sy.x and 

where (5.3) 


or we may define the scaling matrix as 


0 


0 


5 


so by multiplying the original 


image with the scaling matrix we can get scaled image. 

As already mentioned that the alphabets first scanned then converted to the bitmap image, 
so every alphabet is defined as an image. For the scaling and rotation, the above 
mentioned mathematics have been applied by MATLAB software. The nature and the 
density values of alphabets changed after rotation and / or scaling. It can easily be shown 
as: 
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Table 5.1: The comparison table to show the change in the nature of character. 



Alphabets 

Density values Matrix 

Straight 

n 


2.466523 2.190908 

Sample 

b 

i 

2.350251 2.361397 

Scaled 



2.409672 2.550000 

2X3 


f 1 


Sample 

/ 

\ 

2.455222 2.550000 

Rotate 



2.302031 2.443418 

By 30° 

% 


2.262767 2.399731 


5.3 Present Status of the Trained Network 


We have already discussed [14] the problem of straight hand written character 
recognition. The result for training pattern set and the test pattern set have been analyzed 
with the three different algorithm( BP, Random GA, Hybrid EA).In this chapter we 
consider the two neural network architecture. The training set of the hand written English 
alphabets have been considered as the straight alphabets i.e. without scaling and rotation. 
There are two network have been trained with these set of training set. As each network 
consist with two hidden layers with input and output layers. The difference between these 
two architectures used in terms of number of neurons in each hidden layer the first 
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architecture has used five neurons in each hidden layer while the second architecture used 
six neurons in each hidden layer. The analysis of the training and testing for these two 
architecture with the three algorithms( BP, Random GA, and Hybrid E A), suggested that 
the network with six neurons in each hidden layer performs better in terms of number of 
iterations and number of converged weight vectors. The performance of the network is 
found better with the hybrid evolutionary algorithm in comparison to other 
backpropagation (BP) and random genetic algorithm (GA). So that, on the basis of this 
analysis, we consider the neural network with six neurons in two hidden layers (as shown 
in figure 5.2), for the training of the new pattern set. 


Input Hidden Output 

I.mwr I./tvars 



Figure 5.2: The Architecture of Neural Networks used. 
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The selected pattern set consists with the handwritten English alphabets with scaling and 
rotation. The patterns those have been already used for training previously now scaled 
and rotated to construct the new training pattern set. We have analyzed[14] that during 
the recognition of straight hand written English alphabets ( A to Z), on five trials for each 
training set the total number 1475(approx.) of converged weights obtained for hybrid 
evolutionary algorithm and 965 (approx.) for random genetic algorithms. Thus there are 
1475 and 965 number of optimal weight vectors explored, on which (for any one of them) 
all the patterns of training set have properly trained and also the performance has verified 
from the test pattern set. Hence from the above mentioned analysis, it is quite clear that 
any one of the weight vector (Converged or Optimal) can be selected for further training 
for the training set consists with the scaled and rotated form of the already used training 
set. The chromosome of the selected weight vector is shown in figure 5.3, 
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Figure 5.3: The Chromosome of selected weight vector. 
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The training set is considering form the already trained pattern from existing training set 
of hand written English alphabets. Thus any one of the already trained pattern for each 
alphabet (A to Z) has considered an It has been scaled and rotated. This new pattern has 
been used for further training with two algorithms (Random GA and Hybrid EA) & 
performance of the network has been analyzed as shown in table (alphabets table). 

Accomplish the training the following operators and parameters shown in table 5.2 and 
5.3 have been considered. 

Table 5.2: Genetic operators used in the experiments 


Training Algorithms 

Genetic Operators Used 

GA 

Mutation with probability <= 0. 1 and Crossover 

Hybrid EA 

Mutation with probability <= 0.1 and Crossover 


Table 5.3: Parameters used for genetic algorithm and hybrid evolutionary algorithm 


Parameter 

Value 

Adaptation Rate (iT) 

3.0 

Learning Rate (;;) 

0.1 

Momentum Descent Term (a) 

0.9 

Doug’s Momentum Term 

r 1 ] 


r 1 1 
li-Wj 
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Mutation Population Size 

3 

Crossover Population Size 

2000 

Minimum Error Exist in the Network 

{ maxe ) 

0.00001 

Initial weights and biased term values 

Randomly Generated 

Values Between 0 and 1 


5.4 Simulation Design and Implementation Details 

In this simulation design, as we mentioned already that the network with four neurons in 
the input layer, two hidden layers with six neurons in each and five neurons in the output 
layer, will be considered for the training purpose with the two algorithms i.e. random 
genetic algorithm and hybrid evolutionary algorithm. The input / output pattern vectors 
are constructed from the already existing training pattern set of straight alphabets. The 
network has already been trained for the existing training set for the five trials of each 
pattern. Now we select any one of the pattern vector for each alphabet and apply the 
scaling & rotation randomly on the selected pattern. This modified pattern now becomes 
the new pattern for training set. The network is trained for this new training set with 
already converged weight vectors, in this manner the network is continuously adapting 
the change in its behavior corresponding to changing training set [13] .Thus the training 
pattern set is continuously increasing with the newly modified pattern. The network is 
also expected to adopt the continuous changes of the training pattern set. Again, as the 
training process will progress the more and more new converged weight vectors will 
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explore. The performance of the network will analyzed with respect to random GA and 
hybrid EA in terms of the number of iterations and converged weight vectors. Thus, after 
completion of the training up to a satisfactory level, a new randomly generated pattern 
(Not used in the training set) has used as test pattern .The performance of the network can 
analyze with respect to both algorithms for the straight and rotated/scaled test patterns 
5.4.1 Experiment 

In the previous paper five different sets of alphabets were used for the two networks with 
three different algorithms i.e. BP, random GA, and hybrid EA. The alphabets were 
converted into their density function by using MATLAB program, for input data. In the 
each experiment we had taken five trials for the training of the neural network 
population. One of the five samples of each alphabet has been picked up randomly then 
scaling and rotation functions have been applied through MATLAB program. The scaled 
and rotated characters have been converted to their density function by using the same 
MATLAB program which was used earlier to find the density values of straight 
alphabets, these density values then used as a new input training pattern. While for the 
new test pattern set, the new alphabets (not form the trained set of straight alphabets) 
have been picked up and the same process is repeated as stated above to generate the new 
input test patterns. 

All the input sample of alphabets to convert their density values were partitioned into 
four equal parts and the density values of the pixels for each part were calculated. Next, 
the density values of the central of gravities for these partitioned samples were calculated. 
Consequently four values were obtained from a sample, which were then used as the 
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input patterns for the feed-forward neural network. This procedure was used to present 
the input pattern to the feed-forward neural network for each of the samples. 

5.4.2 The Neural Network Architecture 

The architecture of the neural networks was based on a fully connected feed-forward 
multilayer generalized perceptron [13]. The hidden layers were used to investigate the 
effects of the algorithms on the hyper plane. Each network had a single output unit with 
the following activation and output functions, 


H 


A’d = S w. 0[ 
k . n to, i 
1 = 0 k 


h 


(5.3) 


Ik =1 


H 


I w. 0“ 
Vi = 0 ^°k ^ 


= 0\ 


(5.4) 


where function \\ A? is given as, 

V 




(5.5) 


■KA, 


\+e 


Now, similarly, the output and activation value for the neurons of hidden layers and input 
layer can be written as. 


,h 


N 


1 = 0 


and Qh 
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(5.7) 


1 + e 


-KA' 


and 0\=]\a\\—A] 

k \ k ) k 


(5.8) 
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For the online training [13], the change in weight populations was done according to the 
calculated error in the network after each of the iteration of training. The change in 
weight and error in the network can be calculated as follows, 


(^ + 1) = -nY, 4^ + (s)+ 

/=i 


l-(aAWft„(s)) 


Aw,.;, (5 + 1) = 
M dw. 


1 - (aAw;,„ (5)) 


(5.10) 



I [o^ - t] 

p = l^ ^ 


2 


(5.11) 


where 



2 

is the squared difference between the actual output value of the output 


layer for pattern p and the target output value. Doug’s momentum term was used with 
momentum descent term for calculating the change in weights in equations (5.9) and 
(5.10). Doug's momentum descent is similar to standard momentum descent with the 
exception that the pre-momentum weight step vector is bounded so that its length cannot 
exceed 1.0. After the momentum is added, the length of the resulting weight change 
vector can grow as high as 1 / (1 - momentum). This change allows stable behavior with 
much higher initial learning rates, resulting in less need to adjust the learning rate as the 
training progresses. The evolutionary algorithms evolve the population of weights using 
its operators, and select the best population of the weights that minimize the error 
between the desired output and the actual output of neural network system. 
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5.4.3 The Genetic Algorithm Implementation 

A mutation operator which randomly selects a gene in a chromosome and adds a small 
random value between -1 and 1 to that particular gene produces the next generation 
population of 107-gene chromosomes. The size of the next generated population will 
be n + 1 , if the mutation operator applied n times over the old chromosome; 


Qnew 




/=1 




(5.12) 


where symbolizes the old chromosome of 107-gene, e symbolizes the small 

randomly generated value between -1 to 1, 1 symbolizes the randomly selected gene of 

chromosome for adding the e, and symbolizes the next generation 

population of chromosome, i.e. = [Cj , C 2 , Cj , , C„ , C^,^, ] . The iimer 

IJ operator prepares a new chromosome at each iteration of mutation, and the outer 

[J operator builds the new population of chromosomes called . 

Elitism was used when creating each generation so that the genetic operators did not lose 
good solutions. This involved copying the best-encoded network unchanged into the new 

population as given in equation. 3.10, which includes for creating . 

Selection of a chromosome is made from the mutated population of chromosomes 
for which the sum of squared errors is minimum for the feed-forward neural network, i.e. 
iteratively all the chromosome values will be assigned to the network architecture in 
terms of weights and biased values defined in chromosome. After assigning the values, 
the network architecture will be able to fabricate output using these assigned values. For 
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each new chromosome the error can be calculated using these fabricated outputs 


as in equation 2.7. Next, the selection operator will pick a chromosome fromC^^^, 
which generates minimized error for the network. 

Crossover operator takes selected chromosome and creates a child for producing the next 
generation population of 107-gene chromosomes of size«+l. This next generation of 
population is produced by applying the crossover operator n times. Experiments 
performed the crossover operation by swapping the two randomly selected gene values of 
the parent chromosome as given in Fig. 3.4 and equation 3.11: 


^ext_^el y 

i=l\ 


a p 




_^el 

“A-/ 


, T TV-/ n 

^ V P 

a p j 


(5.13) 


where a and symbolize the randomly generated genes positions in CP^ and CPj in 


C chromosome, and C"“' is the next generation of size n + l. 


Chromosome C 


sel 


CP, 


Figure 5.4(A) 


Crossover 

Point 


CP, 


Chromosome 


Figure 5.4(B) 
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Chromosome 

Figure 5.4(C) 



Figure 5.4: (A) Chromosome before applying crossover operator, (B) Applying 
crossover operator on chromosome, and (C) Chromosome after applying crossover 
operator 


A fitness evaluation function defines a function for evaluating the chromosome 
performance. This function must estimate the performance of weight population of a 
given feed-forward neural network. A simple function defined based on the proportion of 
the sum of squared errors is applied. To evaluate the fitness of a given chromosome, each 
weight and biased value contained in the chromosome is assigned to the respective link 
and neuron in the network. The training set is then presented to the network, and the sum 
of squared errors is calculated. The smaller the sum, the higher is the fitness of the 
chromosome. In other words, the genetic algorithm attempts to find a set of weights and 
biased values that minimizes the sum of squared errors. 
mmerror = \.0 
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where symbolizes the error calculated for chromosome among the n + \ 

chromosomes of C"“' population and C™" symbolizes the chromosome which has 
minimized error. 

5.5 Results and Discussions 

The results of the experiment are shown below consist of eight tables [5.5 to 5.12] and 
ten figures [5.5 to 5.20]. Tables [5.5 & 5.6] contain the entries for average number of 
iterations performed by the hybrid evolutionary algorithm and random genetic algorithm 
respectively for the training pattern set of straight alphabets and scaled & rotated 
alphabets. Tables [5.7 & 5.8] contain the entries for average number of convergence 
weight matrices obtained by the hybrid evolutionary algorithm and random genetic 
algorithm respectively for the training pattern, set of straight alphabets and scaled & 
rotated alphabets. Tables [5.9 & 5.10] contain the entries for average number of iterations 
performed by the hybrid evolutionary algorithm and random genetic algorithm 
respectively for the test pattern set of straight alphabets and scaled & rotated alphabets. 
Tables [5.11 & 5.12] contain the entries for average number of convergence weight 
matrices obtained by the hybrid evolutionary algorithm and random genetic algorithm 
respectively for the test pattern set of straight alphabets and scaled & rotated alphabets. 
While the figures [5.5 to 5.12] show the comparison charts between the performance of 
training and recognition of straight and scaled & rotated pattern based on the 
tables [5.5-5.12]. While the figures [5.13 to 5.16] show the comparison chart between the 
performance of the algorithms for trained pattern set and test pattern set., and the Figures 
[5.17 to 5.20] show the comparison chart between the performance of algorithms used. 
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Table 5.5: The iteration performed for training set of straight and scaled and rotated 
alphabets for hybrid evolutionary algorithm. 


Sample 

(Straight 

Alphabets) 

Average Iterations for Five 

Trails of Five Samples of Each 

Alphabets 

Sample 

Scaling 

Rotation 

Average 

Iterations for 

five trials 

A 

27 

A1 

2 by 3 

0 

15 

B 

16 

B2 

2 by 2 

50 

78 

C 

88 

C3 

3 by 3 

35 

5 

D 

32 

D4 

lby2 

22 

356 

E 

20 

E2 

2 by 2 

65 

1003 

F 

14 

F2 

2 by 1 

45 

252 

G 

12 

G1 

3 by 3 

20 

81 

H 

14 

H4 

1 by 1 

35 

^ 16 

I 

31 

11 

3 by 2 

25 

26 

J 

77 

J2 

2 by 3 

30 

119 

K 

56 

K3 

2 by 3 

20 

4, i 

L 

20 

L4 

2 by 2 

35 

80 

M 

30 

M2 

3 by 3 

25 

18 

N 

63 

N2 

2 by 3 

20 

69 

0 

4 ■ 

01 

2 by 2 

25 

14 

P 


P4 

2by 1 

65 

30 
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Table 5.6: The iteration performed for training set of straight and scaled and rotated 


alphabets for random genetic algorithm. 


Sample 

(Straight 

Alphabets) 

Average Iterations for Five 

Trails of Five Samples of 

Each Alphabets 

Sample 

Scaling 

Rotation 

Average 

Iterations for 

five trials 

A 

1552 

A1 

2 by 3 

0 

814 

B 

610 

B2 

2 by 2 

50 

159 

C 

678 

C3 

3 by 3 

35 

1132 

D 

400 

D4 

lby2 

22 

1077 

E 

1161 

E2 

2 by 2 

65 

364 

F 

569 

F2 

2byl 

45 

126 

G 

67 

G1 

3 by 3 

20 

663 

H 

2482 

H4 

Ibyl 

35 

271 

I 

384 

11 

3 by 2 

25 

108 

J 

1365 

J2 

2by3 

30 

684 

K 

4053 

K3 

2 by 3 

20 

426 

L 

408 

L4 

2 by 2 

35 

765 

M 

2171 

M2 

3 by 3 

25 

574 

N 

53 

N2 

2 by 3 

20 

76 

0 

19 

01 

2 by 2 

25 

81 

P 

81 

P4 

2by 1 

65 

391 
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Q 

523 

Q1 

3 by 3 

15 

1185 

R 

397 

R2 

2 by 2 

25 

1506 

S 

108 

S3 

1 by2 

25 

524 

T 

6636 

T4 

2 by 3 

30 

122 

U 

2612 

U2 

2 by 3 

20 

411 

V 

7048 

V5 

2 by 2 

35 

2399 

w 

70 

W1 

3 by 3 

25 

81 

X 

9317 

X4 

2 by2 

45 

975 

Y 

8385 

Y5 

3 by 3 

15 

461 

z 

5298 

Z3 

2 by 3 

30 

699 


Figure 5.6: The comparison chart between the average numbers of iterations 
performed to recognize the straight and scaled & rotated alphabets by hybrid 
evolutionary algorithm. 
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Table 5.7: The number convergence vectors obtained for training set of straight and 


scaled and rotated alphabets for hybrid evolutionary algorithm. 


Sample 

(Straight 

Alphabets) 

Average number of 

convergence weight 

Vectors for 

Five Trails of Five 

Samples of Each Alphabets 

Sample 

Scaling 

Rotation 

Average 

convergence 

weight 

Vectors for 

five trials 

A 

1262 

A1 

2 by 3 

0 

368 

B 

1479 

B2 

2 by 2 

50 

749 

C 

1548 

C3 

3 by 3 

35 

373 

D 

1734 

D4 

1 by2 

22 

1840 

E 

1474 

E2 

2 by 2 

65 

1778 

F 

1496 

F2 

2by 1 

45 

1112 

G 

1434 

G1 

3 by 3 

20 

746 

H 

1737 

H4 

1 by 1 

35 

1113 

I 

1752 

11 

3 by 2 

25 

1126 

J 

1538 

J2 

2 by 3 

30 

1091 

K 

1555 

K3 

2 by 3 

20 

381 

L 

1403 

L4 

2 by 2 

35 

1375 

M 

1260 

M2 

3 by 3 

25 

363 

N 

1337 

N2 

2 by 3 

20 

1091 

0 

" 

1394 

01 

2 by 2 

25 

385 

P 

1756 

P4 

2byl 

65 

1448 
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1411 

Q1 

3 by 3 

15 

1107 

R 

1292 

R2 

2 by 2 

25 

1082 

s 

1601 

S3 

lby2 

25 

1479 

T 

1216 

T4 

2 by 3 

30 

730 

U 

1284 

U2 

2 by 3 

20 

1445 

V 

1648 

V5 

2 by 2 

35 

53.2 

w 

1410 

W1 

3 by 3 

25 

409 

X 

1358 

X4 

2 by2 

45 

1391 

Y 

1297 

Y5 

3 by 3 

15 

1055 

z 

1692 

Z3 

2 by 3 

30 

665 


Figure 5. 7: The comparison chart between the average numbers of convergence weight 
matrix obtained to recognize the straight and scaled & rotated alphabets by hybrid 
evolutionary algorithm. 
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Table 5.8: The number convergence vectors obtained for training set of straight and 


scaled and rotated alphabets for random genetic algorithm. 


Sample 

(Straight 

Alphabets) 

Average number 

of convergence weight 

vectors for Five Trails 

of Five Samples of 

Each Alphabets 

Sample 

Scaling 

Rotation 

Average number 

of convergence 

weight 

vectors for five 

trials 

A 

848 

A1 

2 by 3 

0 

1027 

B 

1037 

B2 

2 by 2 

50 

521 

C 

916 

C3 

3 by 3 

35 

940 

D 

981 

D4 

lby2 

22 

854 

E 

1193 

E2 

2 by 2 

65 

776 

F 

849 

F2 

2 by 1 

45 

794 

G 

910 

G1 

3 by 3 

20 

936 

H 

811 

H4 

Ibyl 

35 

546 

I 

1046 

11 

3 by 2 

25 

580 

J 

981 

J2 

2 by 3 

30 

732 

K 

1209 

K3 

2 by 3 

20 

1143 

L 

mi 

!' 

L4 

2 by 2 

35 

544 

M 

850 

M2 

3 by 3 

25 

960 

N ; 

646 

N2 

2 by 3 

20 

575 

0 

929 

oi 

2 by 2 

25 

866 

p 

512 

P4 

2byl 

65 

762 
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Q 

953 

Q1 

3 by 3 

15 

864 

R 

1116 

R2 

2 by 2 

25 

589 

S 

1186 

S3 

1 by 2 

25 

1055 

T 

1067 

T4 

2 by 3 

30 

280 

U 

890 

U2 

2 by 3 

20 

559 

V 

1137 

V5 

2 by 2 

35 

910 

W 

898 

W1 

3 by 3 

25 

655 

X 

1018 

X4 

2 by2 

45 

1013 

Y 

997 

Y5 

3 by 3 

15 

278 

Z 

1068 

Z3 

2 by 3 

30 

593 


Figure 5.8: The comparison chart between the average numbers of convergence weight 
matrix obtained to recognize the straight and scaled & rotated alphabets by genetic 
algorithm. 
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Table 5.9: The average iterations for five trials, performed for test pattern set of 


straight and scaled & rotated alphabets for hybrid evolutionary algorithm. 


Alphabets 

Average Iterations of 

Straight 

Test Alphabets 

Scaling 

Rotation 

Average Iterations after 

Scaling and Rotation 

A 

73 

2by 1 

45 

4 

B 

1 

2 by 3 

25 

2 

c 

2 

2 by 2 

36 

9 

D 

4 

3 by 3 

45 

9 

E 

2 

3 by 3 

55 

6 

F 

4 

2 by 2 

35 

1 

G 

395 

2 by 2 

30 

2 

H 

7 

2 by 3 

35 

37 

I 

1 

2 by 2 

45 

1 

J 

1 

2 by 2 

35 

1 

K 

24 

2 by 2 

30 

1 

L 

2 

2by3 

15 

3 

M 

23 

2 by 2 

20 

1 

N 

2 

3 by 3 

25 


0 

2 

3 by 3 

45 

■ I- 

P 

21 

2 by 2 

35 

2 

Q 

1 

2 by 2 

20 

^^2 ; 
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Hojt L ompuiing I 



Figure 5.9: The comparison chart between the iterations performed to recognize the 
test pattern, of straight and scaled & rotated alphabets by hybrid evolutionary 



algorithm. 


Average Iterations of Strai^tTest Alphabets 


Average Iterations afterScalingand Rotation 
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Table 5.10: The average iterations for five trials performed for test pattern set of 
straight and scaled & rotated alphabets for genetic algorithm. 


Alphabets 

Average Iterations 

of Straight 

Test Alphabets 

Scaling 

Rotation 

Average Iterations 

after 

Scaling and Rotation 

A 

68 

2by 1 

45 

320 

B 

47 

2by3 

25 

43 

C 

3207 

2 by 2 

36 

831 

D 

189 

3 by 3 

45 

4 

E 

240 

3 by 3 

55 

4 

F 

95 

2 by 2 

35 

382 

G 

113 

2 by 2 

30 

1 

H 

72 

2 by 3 

35 

15 

I 

2309 

2 by 2 

45 

68 

J 

45 

2 by 2 

35 

1 

K 

9 

2 by 2 

30 

45 

L 

50 

2 by 3 

15 

58 

M 

120 

2 by 2 

20 

12 

N 

32 

3 by 3 

25 

24 / ^ ^ 

0 

24 

3 by 3 

45 


P 

34 

2 by 2 

35 

1 

Q 

22 

2 by 2 

20 

2 
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R 

588 

3 by 2 

65 

232 

S 

29 

2 by 2 

45 

89 

T 

114 

2 by 2 

35 

1 

U 

42 

2by 1 

25 

215 

V 

23 

2 by 3 

15 

1 

w 

3 

2 by 2 

20 

33 

X 

159 

2 by 3 

55 

64 

Y 

180 

3 by 2 

30 

13 

z 

387 

2 by 2 

45 

283 


Figure 5.10: The comparison chart between the iterations performed to recognize the 
test pattern, straight and scaled & rotated alphabets by random genetic algorithm. 
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Table 5.11: The average number of convergence weight matrix for five trials, obtained 
for test pattern set of straight and scaled & rotated alphabets for hybrid evolutionary 


algorithm. 


Alphabets 

Average Number of 

Convergence Weight 

Matrix Straight 

Test Alphabets 

Scaling 

Rotation 

Average Number of 

Convergence Weight 

Matrix after 

Scaling and Rotation 

A 

1121 

2by 1 

45 

4 

B 

1 

2 by 3 

25 

2 

C 

27 

2 by 2 

36 

9 

D 

2 

3 by 3 

45 

9 

E 

57 

3 by 3 

55 

6 

F 

10 

2 by 2 

35 

1 

G 

1705 

2 by 2 

30 

2 

H 

1 

2 by 3 

35 

37 

I 

1 

2 by 2 

45 

1 

J 

14 

2 by 2 

35 

1 

K 

1904 

2 by 2 

30 

1 

L 

1831 

2 by 3 

15 

3 

M 

1849 

2 by 2 

20 

1 

N 

8 

3 by 3 

25 

1 

0 

1794 

3 by 3 

45 

1 
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p 

1828 

2 by 2 

35 

2 

Q 

11 

2 by 2 

20 

2 

R 

4 

3 by 2 

65 

21 

S 

1790 

2 by 2 

45 

51 

T 

1783 

2 by 2 

35 

19 

U 

1748 

2byl 

25 

6 

V 

1790 

2 by 3 

15 

1 

w 

26 

2 by 2 

20 

1 

X 

9 

2 by 3 

55 

1 

Y 

1874 

3 by 2 

30 

1 

Z 

1 

2 by 2 

45 

1 


Figure 5.11: The comparison chart between the iterations performed to recognize the 
test pattern, straight and scaled & rotated alphabets by hybrid evolutionary algorithm. 
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Table 5.12: The average number of convergence weight matrix for five trials, obtained 


for test pattern set of straight and scaled & rotated alphabets for genetic algorithm. 


Alphabets 

Average Number of 

Convergence Weight 

Matrix Straight 

Test Alphabets 

Scaling 

Rotation 

Average Number of 

Convergence Weight 

Matrix after 

Scaling and Rotation 

A 

2 

2by 1 

45 

1003 

B 

3 

2 by 3 

25 

858 

C 

1 

2 by 2 

36 

848 

D 

4 

3 by 3 

45 

804 

E 

5 

3 by 3 

55 

1029 

F 

1 

2 by 2 

35 

1027 

G 

1 

2 by 2 

30 

604 

H 

10 

2 by 3 

35 

596 

I 

905 

2 by 2 

45 

769 

J 

1 

2 by 2 

35 

692 

K 

1 

2 by 2 

30 

874 

L 

1 

2 by 3 

15 

1031 

M 

1 

2 by 2 

20 

683 

N 

6 

3 by 3 

25 

745 

O 

2 

3 by 3 

45 

644 

P 

2 

2 by 2 

35 1 

713 
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Q 

9 

2 by 2 

20 

1163 

R 

1 

3 by 2 

65 

873 

S 

5 

2 by 2 

45 

1373 

T 

5 

2 by 2 

35 

270 

U 

3 

2 by 1 

25 

664 

V 

13 

2 by 3 

15 

924 

w 

1 

2 by 2 

1 

i 

20 

1168 

X 

1 

2 by 3 

55 

1400 

Y 

11 

3 by 2 

30 

1178 

z 

1 

2 by 2 

45 

1182 


Figure 5.12: The comparison chart between the iterations performed to recognize the 
test pattern, straight and scaled & rotated alphabets by random genetic algorithm. 
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Figure 5.13: The comparison chart between the iterations performed to recognize the 
training set pattern with test pattern set for scaled & rotated alphabets by hybrid 
evolutionary algorithm. 

—4 — Iterations forTraining Set Patterns Iterations forTest Set Patterns 

1200 
1000 
800 
600 
400 
200 
0 

A B C D E F GH I J KLMNO P Q R S T UVWXYZ 



Figure 5.14: The comparison chart between the iterations performed to recognize the 
training set pattern with test pattern set for scaled & rotated alphabets by genetic 
algorithm. 
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Figure 5 A 5: The comparison chart between the numbers of convergence weight 
matrices obtained to recognize the training set pattern with test pattern set for scaled & 
rotated alphabets by hybrid evolutionary algorithm. 




Figure 5.16: The comparison chart between the numbers of convergence weight 
matrices obtained to recognize the training set pattern with test pattern set for scaled & 
rotated alphabets by genetic algorithm. 
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Figure 5.17: The comparison chart between the numbers of iterations performed by 
random genetic algorithm and by hybrid evolutionary algorithm to recognize the 


training set pattern for scaled & rotated alphabets. 



Figure 5.18: The comparison chart between the numbers of iterations performed by 
random genetic algorithm and by hybrid evolutionary algorithm to recognize the test 


set pattern for scaled & rotated alphabets. 
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Figure 5.19: The comparison chart between the numbers of convergence weight 
matrices obtained by random genetic algorithm and by hybrid evolutionary algorithm 
to recognize the training set pattern for scaled & rotated alphabets. 



Figure 5.20: The comparison chart between the numbers of convergence weight 


matrices obtained by random genetic algorithm and by hybrid evolutionary algorithm 


to recognize the test set pattern for scaled & rotated alphabets. 
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5.6 Conclusion 

The results and the comparison charts mentioned above clearly show that the on-line 
training to recognize the scaled and rotated hand written English alphabets by genetic 
algorithm and hybrid evolutionary algorithm, provide a better scope to solve these kind 
of problems. The recognition is done in a very few iterations through the trained network 
for straight hand written English alphabets, suggests the performance of online training is 
good enough. 

By interpreting the results we can draw the following conclusions: 

1. Training of the scaled and rotated patterns have taken less number of 
iterations and obtained higher number of weight matrices in comparison to 
straight patterns as shown in Figure [5.5 to 5.8] for both hybrid evolutionary 
algorithm and genetic algorithm. It shows that the training of the network is 
accurate and the network can easily adapt the new similar pattern. 

2. The recognition of test pattern of scaled and rotated alphabets have also taken 
the less number of iterations and obtained higher number of weight matrices 
in comparison to test patterns of straight alphabets as shown in figure 
[5.9 to 5.12]. 

3. While comparing the results for training and recognition of the scaled and 
rotated pattern as shown in Figure [5.13 to 5.16], it is found that the network 
is taking the less number of iterations to recognize the pattern in comparison 

^ ^ ^ ^ ^ the number of iterations performed for the training of the network and also 
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obtains the higher number of convergence weight matrices. It also leads the 
same conclusion that the training of the network is accurate and reliable. 

4. By comparing the results of the hybrid evolutionary algorithm and the genetic 
algorithm Figure [5.17 to 5.20], it is foimd that the performance of hybrid 
evolutionary algorithm is far better in comparison to the genetic algorithm. 

5.7 Future Work 

The successful implementation of feed forward neural networks with evolutionary 
algorithms for hand written English (straight and scaled and /or rotated) alphabets, 
motivated to apply the same for other complex problems of pattern recognition. It is 
expected to extend the work in future for the following problems: 

1 . Recognition of overlapped alphabets. 

2. Refinement of training of the incomplete training set due to wage ness, fuzzy 
ness, or some other complexity in the samples. 
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